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Researchers: show 
world leaders how 
to behave in acrisis 


Scientists are dropping everything to 
team up and fight COVID-19. Presidents 
and prime ministers should, too. 


Ithough the coronavirus pandemic has become 
athreat to every country on Earth, world lead- 
ers are all at sea — showing few signs that they 
wish to cooperate genuinely to combat it. By 
contrast, tens of thousands of researchers from 
different disciplines and countries have joined research and 
public-health efforts to fight COVID-19 (see page 13). They 
are working across continents, lending their time, ideas, 
expertise, equipment and money to the emergency pub- 
lic-health effort. They are providing virus testing facilities; 
donating personal protective equipment; designing and 
manufacturing ventilators and other breathing apparatus. 
And when it comes to the research effort itself, thousands 
of volunteers from all over the world are enthusiastically 
signing up to say they are available to do what they can. 

University-based laboratories such as those at the Broad 
Institute of MIT and Harvard in Cambridge, Massachusetts, 
and at the National University of Colombia in Bogota, are 
carrying out COVID-19 tests. That said, more universities 
with medical schools need to provide access to virus test- 
ing facilities. 

The emergency response to the pandemic is also cre- 
ating new types of collaboration. For example, research- 
ers and clinicians in the United Kingdom, China and Italy 
have been working at speed with engineers from Formula 
1motor racing. In the space of a week, they have managed 
to reverse-engineer a device that helps people with serious 
lung infections to breathe more easily. 

The breathing aid uses a method known as continuous 
positive airway pressure. It works by supplying people 
experiencing breathing difficulties with relatively small 
but continuous amounts of air, and it has the potential 
to reduce the numbers of people needing ventilators in 
hospitals. We urge the project’s partners to publish and 
share their designs so that the device can be tested globally, 
and so that it can eventually be made available to health 
authorities in low- and middle-income countries. 

The COVID-19 research effort also got a welcome 
boost. Researchers from around the world have set up 
an online platform for those who want to volunteer for 
research-related tasks. The platform, Crowdfight COVID- 
19, matches volunteers to researchers who have specific 
tasks or needs — anything from transcribing data from 
notebooks and searching the literature, to providing spe- 
cific expertise. As this editorial went to press, Crowdfight 
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Clinicians and automotive engineers are jointly developing breathing aids. 
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matter of 
time before 
world leaders 
will haveto 
step up.” 


COVID-19 had attracted more than 35,000 volunteers. 

These efforts are important because world leaders need 
tosee that international coordination on COVID-19 is thriv- 
ing. Presidents and prime ministers are moving too slowly, 
in stark contrast to their response to the financial crisis of 
2008, when heads of government, ministries of finance, 
central banks and other multilateral lending agencies got 
together and agreed what needed to be done. 

Although different funding agencies are collaborating 
on coronavirus research, there is less consensus at the 
highest levels of government, and most countries seem 
to be making independent decisions about howto protect 
their citizens. 

As infections and deaths continue to rise, it is only a 
matter of time before world leaders will have to step up. 
They have no choice, because there’s little point in extin- 
guishing the virus in one country when it’s exploding 
elsewhere. A genuinely global response is needed — and 
world leaders must follow the fine example being set by 
researchers. 
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Women’s History 
Month: celebrate 
more researchers 


This year’s event was derailed because of 
COVID-19. Next year, let’s hear about more 
female scientists, clinicians and engineers. 


ast week, academic and performance artist 

Colleen Webster was looking forward to doing 

her one-woman show on the life of the biologist, 

science writer and environmental pioneer Rachel 

Carson. “She was shy. She was humble. Devoted to 
family. Committed to research and to protecting nature. 
That, and she changed the world,” Webster told Nature 
from her home in Maryland. 

With coronavirus raging across the United States, 
Webster’s performance at Harford Community College 
in Bel Air, Maryland, had to be postponed. Instead, she 
recorded a short video, witha promise to be back perform- 
ing live as soon as conditions allow. 

Had it gone ahead, the play would have been one of 
hundreds of events during Women’s History Month, com- 
memorating and celebrating women’s contributions to 
society. Women often have to fight at great cost to make 
themselves heard, and insome cases their achievements 
are overlooked, underplayed, denied or undermined by 
male colleagues — and by some historians, too. 

Today’s environmental regulators — including ministries 
and environmental-protection agencies — can trace some 
of their lineage to the movement inspired by warnings in 
Carson’s 1962 book Silent Spring. But Carson endured 
persistent personal and sexist attacks from the chemical 
industry and from elected politicians who supported the 
industry. The attacks also questioned the careful research 
that had led to her landmark conclusion that the pesticide 
DDT was killing not only insects, but also the birds that 
feed on those insects. 

The severity of the attacks prompted a book review in 
Nature to call for an end to “impugning her scientific qual- 
ity” (C. W. Hume Nature 18, 117; 1963). The review added: 
“She rests her case not on vague generalizations but on 
concrete instances, and authenticates it with forty-eight 
pages of references to scientific literature.” 

Prejudice was a constant in the life of double Nobel 
laureate Marie Curie, too, as the British actor Rosamund 
Pike powerfully demonstrates in the biopic Radioactive. 
The film premiered in London on 8 March — International 
Women’s Day — but, sadly, it is likely that few people will be 
able to see it on the big screen, because cinemas worldwide 
are closing their doors. 

But itisn’t only well-known scientists whose recognition 
is lacking during this event. More also needs to be done 
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Silent Spring author Rachel Carson helped inspire the global green movement. 
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to highlight the contributions of women from low- and 
middle-income regions, and those from under-represented 
or minority groupsin their countries. The lack of such rec- 
ognition is surprising, considering the hard work being 
done to update Wikipedia pages with profiles of female 
researchers, as well as the increasing trend to call out sex- 
ism and discrimination in science, past and present. 

Inthe United States, the main coalition of organizations 
behind Women’s History Month — including the Smith- 
sonian Institution, the US National Archives and Records 
Administration and the US National Endowment for the 
Humanities — have research in their DNA. It shouldn't 
be beyond them to more actively and strategically high- 
light a wider range of contributions and achievements 
from women in research, to feature alongside the other 
professions. 

Ata minimum, they need to make it easier for readers 
to find researchers or scientists on the official Women’s 
History Month website (go.nature.com/2xysrj4). At pres- 
ent, the website hosts links to exhibitions and collections 
highlighting researchers, but areader searching for female 
researchers would have to scroll through pages of individual 
entries from many different professions. And often, these 
links are to existing content — rather than bespoke material. 

The cancellation or postponement of many events cele- 
brating Women’s History Month because of the coronavirus 
is undoubtedly asetback. As planning begins for next year’s 
events, institutions should do more to identify and cele- 
brate women who made important contributions to discov- 
ery, invention and innovation. And Nature is keen to help. 

Women are making history right now as they work 
intensely — increasingly as equal partners — in the global 
effort to research and understand the devastating corona- 
virus pandemic. Let’s ensure that the achievements of 
all women are recognized, recorded and hopefully one 
day celebrated, so that the history being made today is 
recorded more accurately than how it was in the past. 
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SHEILA M.E. BOYD 


A personal take on science and society 


World view 


By lan Boyd 


We practised fora 


pandemic, but didn’t brace 


Unheeded lessons from simulations of health 
and other disasters could still assist recovery. 


lanners have known that something like COVID-19 
would come, even if they could never be sure when 
or from where. It is hard for politicians to garner 
the social licence to prepare for catastrophes that 
people see as unlikely and far from their daily lives. 

From 2012 to 2019, I was a chief scientific adviser — a 
technocratic expert — in the UK government. When an 
emergency did happen, suchas the release of a nerve agent 
inthe city of Salisbury in 2018, I knew that real people might 
die if] made mistakes. 

Itook part in simulated exercises to prepare my country 
for the practical, economic and social shock waves from 
rare but devastating events — volcanic eruptions that affect 
whole hemispheres, meteor strikes, zoonotic epidemics 
and other calamities. I recall a practice run for aninfluenza 
pandemic in which about 200,000 people died. It left me 
shattered. 

We learnt what would help, but did not necessarily 
implement those lessons. The assessment, in many sec- 
tors of government, was that the resulting medicine was so 
strong that it would be spat out. Nobody likes living under 
a fortress mentality. 

Two messages were clear. First, that we were poorly 
prepared. Second, that governments would quickly be 
called onto cover the damage. They are the insurers of last 
resort, even if they rarely quantify and plan for those risks. 
Our experience of COVID-19 is showing just how true this 
is, and suggests what we should do once recovery begins. 

My experiences also highlighted two priorities. Oneis that 
the teams fighting COVID-19 need resilience. Health-care 
specialists are the most vulnerable, but people throughout 
government are under strain. Politicians, specialists and 
others must cope with mental exhaustion, something most 
people never experience or witness. They arejust flesh and 
blood, with family to worry about, and they get sick. Indeed, 
we have already seen politicians and government experts 
across the world fall ill. We need contingency plans to keep 
government functional at all levels. 

The other priority is getting people to respond well to 
interventions, especially changes to routine. This is one of 
the biggest unknowns in these scenarios, and yet compli- 
ance canbe the most crucial factor in determining whether 
an intervention works. Balancing lockdown against the 
social licence to act as one sees fit is essential. Ideally, 
policy implementers would land the perfect response at 
the first try, but this almost never happens. It’s messy and 
full of uncertainty. This can make the government seem 
indecisive, as we’ve seen as some governments shift their 
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position on the role of herd immunity or whether alcohol 
outlets counted as ‘essential services’ that can remain open 
when other shops closed. But being flexible is actually a 
strategic imperative. 

After the first few days (if that) of emergencies such as 
COVID-19, there is no manual to follow. It is important to 
learn from previous epidemics, as well as to respond to 
the evidence emerging in real time. But every country will 
require its own approach, even if the same epidemiological 
principles for reducing transmission apply everywhere. 
Plans provide only a template — command and control, 
reserve personnel and resources, computer models and 
communications tools and so on. How these are deployed 
and coordinated needs to be highly fluid. 

The priority in all instances is to get ahead of this 
fast-moving disease. This means looking beyond purely 
technical measures such as tests, therapies and vaccines. 
Asuccessful response uses social forces such as peer pres- 
sure and altruism to help people adapt to changing circum- 
stances. It delivers messages and support that promote 
self-reliance rather than encourage people to fall back on 
stressed state support. Cultures and communities used to 
providing some of their own services (for example, neigh- 
bourhood-watch programmes) often fare better than those 
accustomed to relying on state support. 

Recovery, when it eventually happens, is going to bring 
fresh challenges. In New Orleans, Louisiana, flooding led 
to long-term mental-health problems; in Salisbury, it took 
morethana year of diligent cleaning to return parts of the 
city to public use. 

While I was chief scientific adviser at the UK Department 
for Environment, Food and Rural Affairs, much of the talk 
while planning for Brexit was about keeping food, drugs, 
fuel and so onavailable. It showed how little we knew about 
those vulnerabilities. If they fail, many of our life-support 
systems, suchas the water supply, fail, too. Ironically, Brexit 
planning might help the United Kingdom to tackle the 
greater challenge of COVID-19. We need to start planning 
now for how we will rebuild. 

Despite financial safety nets spread out by some 
governments, we should not expect our systems of 
resources and service provision to bounce back to the 
way they were. COVID-19 is bringing systemic change on 
a global scale. 

Having been faced with our weaknesses, I hope weseea 
shift in values so that we are less likely to continue with our 
unsustainable rates of resource consumption, assumptions 
that there will always be a benevolent government to fall 
back on and disregard for vulnerabilities attributable to 
climate change. 

COVID-19 might be just a wake-up call: let’s use it to 
rebuild our systems into something more resilient. 
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The world this week 


Newsin brief 


TENS OF THOUSANDS 
OF SCIENTISTS JOIN 
THE FIGHT AGAINST 
CORONAVIRUS 


As they shutter their labs 
indefinitely, tens of thousands 
of researchers are volunteering 
to help fight the pandemic in any 
way they can. Working around 
the clock, scientists at the Broad 
Institute of MIT and Harvard 

in Cambridge, Massachusetts, 
canrun about 2,000 COVID-19 
tests per day (pictured). In 
places where testing is still 
scarce — whichis to say, in much 
of the world — their actions can 
provide crucial relief to public- 
health systems stretched to 
their limits. 

Universities are organizing, 
researchers are banding 
together, and efforts to get 
volunteers and equipment 
where they are most needed are 
in progress around the globe. 
“All of the people who are now 
suddenly not working have 
skills that can be applied,” says 
Michael Monaghan, a molecular 
ecologist at the Leibniz Institute 
of Freshwater Ecology and 
Inland Fisheries in Berlin. 

The Association of American 
Universities, aconsortium 
of 65 leading US research 
institutions based in Washington 
DC, has used Twitter to urge its 
community members to donate 
spare personal protective 
equipment to hospitals and 
medical facilities. Many have 
heeded the call. 


BUSH-FIRESMOKE 
LINKED TOHUNDREDS 
OF AUSTRALIADEATHS 


Researchers estimate that smoke 
pollution probably killed more 
than 400 people from November 
to February during the 
unprecedented bush fires across 
southeast Australia. Thirty-three 
people were killed in incidents 
directly related to the fires. 
Air-pollution researcher 
Fay Johnston at the University 
of Tasmania in Hobart led a 
team that collected data onthe 
average number of emergency- 
department admissions, 
hospitalizations and deaths on 
any given day. The researchers 
mapped detailed data on air- 
pollution levels from 1 October 
to 10 February and modelled 
how these would have increased 
the emergency admissions. 
Their model suggests that 
there could have been around 
417 additional deaths and 
1,305 emergency-department 
admissions for asthma attacks 
over the period of the fires. An 
extra 3,151 people could also have 
been admitted to hospital for 
heart and respiratory problems. 
The results are reported in The 
MedicalJournal of Australia, and 
are the first published estimate 
of the scale of the medical 
impact of the bush-fire smoke 
(N. Borchers Arriagada et al. 
Med. J. Aust. http://doi.org/darg; 
2020). Johnston calculates that 
the haze affected around 80% 
of Australia’s 25 million people, 
some for many weeks at atime. 


OUTBREAK COULD 
DELAY SPACE 
TELESCOPELAUNCH 


The world’s most expensive 
telescope is the latest project 

to fall foul of the coronavirus 
pandemic. The James Webb 
Space Telescope (pictured) was 
supposed to launch in March 
2021, but faces possible delays 
because NASA halted most work 
onthe US$8.8-billion telescope 
on 20 March. The telescope 

had been going through final 
assembly and tests in Southern 
California — a region now locked 
down to stop people spreading 
the coronavirus. 

“Delaying launch is absolutely 
the right thing to do, if it will 
keep the people working on 
the mission safe,” says Zachory 
Berta-Thompson, an exoplanet 
researcher at the University 
of Colorado Boulder. “We 
astronomers can continue to be 
patient.” 

The hold-up adds toa long list 
of woes for the Webb telescope, 
which has experienced years of 
delays and cost overruns. 

NASA is pushing ahead with 
work onits Mars rover, slated 
to blast off between 17 July and 
5 August. If it misses the launch 
window, the mission must wait 
two years to try again. They are 
doing “heroes’ work” in keeping 
the mission on track for aJuly 
launch, said Thomas Zurbuchen, 
NASA’s head of science. 

The European Space Agency 
has already delayed a Mars rover 
it planned to launch in July, in 
part because of the outbreak. 
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UNIVERSITY PAYS 
MILLIONS IN 
SEXUAL-HARASSMENT 
SETTLEMENT 


The University of Rochester 

in New York has agreed to pay 
aUS$9.4-million settlement 

to researchers who sued the 
institution over how it handled 
allegations of sexual harassment 
against a cognitive scientist. 
The settlement, announced 

on 27 March, brings to aclose 
one of the most prominent 
harassment cases at a US 
university. 

The nine researchers sued the 
institution in December 2017 
over its response to allegations 
that Florian Jaeger, a professor 
inthe department of brain and 
cognitive sciences, had sexually 
harassed students. 

The researchers — who 
include former faculty members 
and a former student who 
collectively filed complaints 
against Jaeger on behalf of 
other students — argued that 
the university retaliated against 
them for reporting Jaeger, 
harming their careers. 

In 2018, the university 
commissioned an investigation 
into the allegations against 
Jaeger, which cleared him of 
the most serious complaints. 
Jaeger, who continues to deny 
the substance of the allegations 
made against him, was not 
a party in the suit andis still 
employed at the university. 

University spokesperson Sara 
Miller confirmed the amount of 
the settlement. “No party to the 
settlement admitted liability or 
fault,” she said. “The university 
is committed to providing a safe 
and inclusive environment for 
its students, faculty, and staff.” 

All nine plaintiffs have left the 
University of Rochester. 
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The world this week 


News in focus 


Clinical research has been disrupted as hospitals devote more resources to caring for people critically ill with COVID-19. 


CORONAVIRUS SHUTS DOWN 
TRIALS OF DRUGS FOR 
MULTIPLE OTHER DISEASES 


Studies grind to a halt as fears of health-care shortages 
and risk of exposure put the brakes on clinical research. 


By Heidi Ledford 


hen 2020 began, Neena Nizar and 

her family were poised to harvest 

the fruits ofa decade of hard work 

and sacrifice: a clinical trial of an 

experimental treatment for her 
two sons’ rare genetic disorder that was slated 
to start before the year’s end. 

“I can’t even put into words what we’ve been 
able to do to get to this point,” she says. “My 
kids have given bone biopsies; | gave up my job 
and moved to a new country. We’ve just been 
going, going, going.” 

Then came COVID-19. Now, Nizar wipes 


away tears in her Nebraska home as she reads 
amessage from researchers at the US National 
Institutes of Health. Work to assess the tox- 
icity of the experimental therapy in animals 
has stalled because laboratories have been 
forced to close. The same might be true, she 
has heard, of the firm hired to manufacture 
the drug for clinical testing. 

Nizar’s sons have a painful degenerative dis- 
order called Jansen’s disease, which has ham- 
pered their bodies’ ability to regulate calcium 
and phosphate, causing kidney damage and 
bone deformations. Her older son, who is 11, 
has had at least one operation every year for 
the past five years, The longer he has to wait to 
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receive the experimental treatment, the less 
likely it is to work. 

“My son asks me all the time, ‘So when are 
we doing this trial? When can1I?1 don’t want to 
feel this pain anymore,” Nizar says. “I feel like 
we were chugging along on a train and then 
somebody dropped a huge boulder on it.” 

Scientists are rushing to launch clinical trials 
of experimental vaccines against the corona- 
virus, and treatments for COVID-19. But as 
hospitals brace for an onslaught of critically 
ill patients and laboratories worldwide are dis- 
rupted, researchers have had to shelve clinical 
trials of therapies for other illnesses. 

“We're going to see a nearly complete 
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close-down in clinical research,” says Tim 
Dyer, chief executive of Addex Therapeutics, 
a biotechnology company based in Geneva, 
Switzerland. “The health-care systems will 
simply be overloaded.” On 18 March, Addex 
announced that it would delay the start ofa 
clinical trial to treat involuntary movements 
in people with Parkinson’s disease. 

At Yale University in New Haven, Connect- 
icut, lung-cancer researcher Roy Herbst says 
clinical trials for cancer have been cut to 
“almost zero” and are allowed only whena par- 
ticipant is deemed to have exceptional need. 

“The whole process has really ground toa 
halt,” he says, “and I feel bad because there 
are patients who might have benefited from 
those trials.” 

But the measures are necessary, he adds. 
Many people with advanced cancer are vul- 
nerable to infection, and trips to the clinic for 
treatment and assessments could be deadly if 
patients are exposed to the coronavirus. 

Herbst has had to ask three-quarters of his 
colleagues in the oncology departmentat Yale 
to stay away from the hospital to minimize 
their risk of infection. Instead, they are being 
held in reserve to treat people with COVID-19 
if the first round of clinicians become infected. 
Even routine procedures such as biopsies, 
sometimes required for enrolment ina clinical 
trial, are now difficult to schedule as hospi- 
tals struggle with personnel and equipment 
shortages. 


Agencies adapt 


Government agencies have released guid- 
ance for investigators who need to suspend 
or modify trials. The US Food and Drug Admin- 
istration, for example, has issued guidance for 
trials that might have to pause, change their 
study plans or make do with incomplete data 
because of the COVID-19 pandemic. Ethics 
committees are working overtime as research- 
ers file requests to alter their clinical-trial plans 
in ways that minimize how often participants 
need to venture into the clinic, says Barbara 
Bierer, who directs the Multi-Regional Clinical 
Trials Center of Brigham and Women’s Hos- 
pital and Harvard in Boston, Massachusetts. 
Agencies and funders have shown remark- 
able flexibility, says Charles Blanke, an 
oncologist at Oregon Health & Science Uni- 
versity in Portland and leader of the publicly 
funded SWOG Cancer Research Network. The 
US National Cancer Institute announced on 
23 March that it would allow the investigators 
it funds to assess trial participants’ health 
remotely where possible. Some doctors’ 
assessments may be carried out over video 
calls, and some audits of clinical-trial proce- 
dures will be conducted virtually, with inspec- 
tors examining the paperwork online rather 
than visiting the clinic to assess standards. 
The rapid release of these guidelines is a 
particular relief because many clinical-trial 
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sites did not plan for a pandemic such as that 
of COVID-19, says Blanke, despite warnings 
from experts that one was inevitable. After this 
outbreak, he says, clinical researchers will be 
better prepared, and the increased capacity 
for virtual visits will be a lasting boon to both 
researchers and patients. 

For now, it’s unclear what long-term effects 
the outbreak will have on drug regulation. 
“There will be a disruption, obviously,” says 
Bierer. “And whether that delay manifests in 
delaying final approvals is unknowable today.” 

It’s that uncertainty that haunts Nizar. She 
worries that her concerns might sound selfish 


in the face of the global suffering caused by the 
pandemic. But she also knows that the delay 
to her clinical trial could last well beyond the 
months of social isolation and lockdowns. 

Her best hope now, she says, is that regula- 
tors willlearn fromthe speed and urgency with 
which a candidate vaccine for the COVID-19 
virus has been rushed into clinical trials, for- 
going some of the usual pre-trial animal tests. 
Nizar wants to see therapies for rare diseases 
treated with the same urgency. 

“Our lives have always been in panic mode,” 
she says. “Now the world has a glimpse into 
what our reality is.” 


HOW BLOOD FROM 
COVID-19 SURVIVORS 
MIGHT SAVE LIVES 


New York City researchers hope antibody-rich 
plasmacan keep people out of intensive care. 


By Amy Maxmen 


ospitals in New York City are gearing 
up to use the blood of people who 
have recovered from COVID-19 as 
a possible antidote for the disease. 
Researchers hope that the centu- 
ry-old approach of infusing patients with the 
antibody-laden blood of those who have sur- 
vived an infection will help the city — now the 
US epicentre of the outbreak — to avoid the 


fate of Italy, where intensive-care units (ICUs) 
are so crowded that doctors have turned away 
people who need ventilators to breathe. 

The efforts follow studies in China that 
infused patients with plasma — the fraction 
of blood that contains antibodies, but not 
red blood cells — taken from people who had 
recovered from COVID-19. But these studies 
have reported only preliminary results so 
far. The ‘convalescent plasma’ approach has 
also seen modest success during outbreaks 


Hospitals in New York City are becoming overwhelmed with coronavirus cases. 


© 2020 Springer Nature Limited. All rights reserved. 


MISHA FRIEDMAN/GETTY 


of severe acute respiratory syndrome (SARS) 
and Ebola — but US researchers are hoping to 
increase the value of the treatment by select- 
ing donor blood that is packed with antibodies 
and giving it to people most likely to benefit. 

A key advantage of convalescent plasma 
is that it’s available immediately, whereas 
drugs and vaccines take months or years to 
develop. Infusing blood in this way seems to 
be relatively safe, as long as it is screened for 
viruses and other infectious agents. Scientists 
who have led the charge to use plasma want to 
deploy it now as a stopgap measure, to keep 
serious infections at bay and hospitals afloat 
as atsunami of cases comes crashing their way. 
“Every patient that we can keep out of the ICU 
is a huge logistical victory because there are 
trafficjams in hospitals,” says MichaelJoyner, 
ananaesthesiologist and physiologist at Mayo 
Clinic in Rochester, Minnesota. 

Thanks to the researchers’ efforts, the US 
Food and Drug Administration announced last 
week that it will permit the emergency use of 
plasma for patients in need. As early as this 
week, at least two hospitals in New York City 
— Mount Sinai and Albert Einstein College of 
Medicine — hope to start using survivor plasma 
to treat people with the disease, Joyner says. 

After this first roll-out, researchers hope the 
use will be extended to people at a high risk 
of developing COVID-19, such as nurses and 
physicians. For them, it could preventillness so 
that they can remain inthe hospital workforce, 
which can’t afford to be depleted. 


Hard evidence 


At the same time, US academic hospitals are 
planning to launch placebo-controlled clinical 
trials to collect hard evidence on how well the 
treatment works. 

Liise-anne Pirofski, an infectious-disease 
specialist at Albert Einstein College of 
Medicine, says that, in one proposed trial, 
researchers plan to infuse patients at an 
early stage of the disease and see how often 
they advance to critical care. Another trial 
would enrol people with severe infections. 
A third would explore plasma’s use as a pre- 
ventive measure for people in close contact 
with those confirmed to have COVID-19, and 
would evaluate how often such people fall ill 
after an infusion, compared with others who 
were similarly exposed but not treated. These 
outcomes can be measured within a month, 
she says. “Efficacy data could be obtained very, 
very quickly.” 

Even if it works well enough, convalescent 
serum might be replaced by modern therapies 
later this year. Research groups and biotech- 
nology companies are identifying antibodies 
against the coronavirus, with plans to develop 
these into precise formulas. “The biotech 
cavalry will come on board with isolating anti- 
bodies, testing them, and developing drugs 
and vaccines, but that takes time,” says Joyner. 


Should we infect healthy 
people with coronavirus? 


With no end to the coronavirus pandemic in 
sight, researchers are discussing a dramatic 
approach that could help to end it: infecting 
a handful of healthy volunteers with the 
virus to speed up vaccine testing. 


Many scientists see a vaccine as the only 
solution to the pandemic. At least one 
candidate is in safety trials, but a major 
hurdle is showing that a vaccine works. This 
typically requires large studies in which 
thousands of people receive a vaccine 

or a placebo, and researchers track who 
becomes infected naturally. 

It would be quicker to do a ‘human 
challenge’ study, argue scientists in a March 
preprint (N. Eyal et al. Preprint at DASH 
http://go.nature.com/33y1hey; 2020). This 
would involve exposing healthy people to 
the virus and seeing whether those who are 
vaccinated escape infection. 

Nir Eyal, the director of the Center for 
Population-Level Bioethics at Rutgers 
University in New Brunswick, New Jersey, 
and co-author of the preprint, tells Nature 
how the study could be done. 


Why should we consider human-challenge 
studies of coronavirus vaccines? 

They could greatly accelerate the time to 
approval and potential use. Testing vaccines 


“There are some 
historical precedents 
for exposure to very 
deadly viruses.” 


in phase Ill trials takes a long time. That's 
done on many people, some of whom 
get the vaccine and some of whom get 
placebos or competing vaccine candidates. 
Researchers then look for differences 
between these groups in infection rates. 
But many people will try to be careful in 
this outbreak — by self-isolating, say — and it 
will take a very long time until interpretable 
results emerge. If, instead, one exposes all 
study participants to the pathogen, one 
can not only rely on far fewer volunteers 
but, more importantly, take a much shorter 
period to get results. 
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Are there any precedents for infecting 
healthy people with a pathogen? 

We do human-challenge studies for less 
deadly diseases quite frequently — for 
example, for influenza, typhoid, cholera 
and malaria. There are some historical 
precedents for exposure to very deadly 
viruses. The thing that demarcates the 
design that we propose from some of these 
historical instances is that we feel there is a 
way to make these trials surprisingly safe. 


How could you conduct such a study? 
You would start only after some preliminary 
testing to ensure that a vaccine candidate is 
safe and that it raises an immune response in 
humans. You then gather a group of people 
at low risk from any exposure — young and 
healthy individuals — and ensure that they 
are not already infected. You give them 
either the vaccine candidate or a placebo 
and wait for an immune response. Then you 
expose them to the virus. 

You follow all the participants closely 
to catch any signs of infection as early as 
possible. You are trying to check whether 
the vaccine group is doing better than the 
placebo group. That might be in terms of 
viral levels, the time until symptoms emerge 
or whether they’re infected or not. 


Is this ethical? 
It might seem that anybody volunteering to 
participate in such a study lacks capacity 
for rational decision-making. But humans do 
many important things out of altruism. And 
although the study introduces risks, it also 
removes them. And the net risks, although 
unclear, are not clearly extremely high. So, 
it is potentially rational — even from a selfish 
point of view — to participate in such a study. 
We also let humans volunteer to do risky 
things all the time; for example, to be in the 
emergency medical services during this 
period. That elevates their risk of getting 
infected but it’s very important. In this case, 
vaccines could be our societies’ only way out 
of the bind between economic stagnation 
and widespread mortality. 


Interview by Ewen Callaway 
This interview has been edited for length 
and clarity. 
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News in focus 


WHAT THE CRUISE-SHIP 
OUTBREAKS REVEAL 


ABOUT COVID-19 


Closed environments are an ideal place to 
study howthe new coronavirus behaves. 


By Smriti Mallapaty 


hen COVID-19 was detected 

among passengers on the cruise 

ship Diamond Princess, the ves- 

sel offered a rare opportunity to 

understand features of the new 
coronavirus that are otherwise hard to inves- 
tigate. Some of the first studies from the ship 
have provided estimates of the disease’s sever- 
ity and allowed researchers to investigate the 
share of infections with no symptoms. 

Information gleaned from such outbreaks is 
crucial for people making decisions on howto 
manage the epidemic, say researchers. 

“Cruise ships are like an ideal experiment of 
aclosed population. You know exactly who is 
there and at risk and you can measure every- 
one,” says John Ioannidis, an epidemiologist at 
Stanford University in California. This is differ- 
ent from studying the spread ina wider popu- 
lation, where only some people, typically with 
severe symptoms, are tested and monitored. 

On 1 February, a passenger who had dis- 
embarked from the Diamond Princess days 
earlier in Hong Kong tested positive for the 
COVID-19 coronavirus. The ship was quaran- 
tined immediately after it arrived in Japanese 
waters on 3 February, with 3,711 passengers 
and crew members on board. Over the next 
month, more than 700 people were infected. 

Outbreaks seed easily on cruise ships 
because of the close confines and high pro- 
portions of older people, who tend to be 
more vulnerable to the disease. Since the Dia- 
mond Princess, at least 25 other such vessels 
have confirmed COVID-19 cases — including 
78 cases on the Grand Princess, which was 
quarantined off the coast of California. 

Japanese officials ran more than 3,000 tests 
aboard the Diamond Princess. Testing almost 
all of the passengers and crew helped 
researchers to understand a key blind spot 
in many infectious-disease outbreaks — how 
many people are actually infected, including 
those who have mild symptoms or none at 
all. These cases often go undetected in the 
population. 

One team reports in Eurosurveillance that 
by 20 February, 18% of all infected people on 
the ship had nosymptoms (K. Mizumotoetal. 
Euro Surveill. 25, 2000180; 2020). “That is a 
substantial number,” says co-author Gerardo 
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Chowell, an epidemiologist at Georgia State 
University in Atlanta. 

Another team used data from the ship to 
estimate that in China, the proportion of 
deaths among people confirmed to have the 
disease — the case fatality rate (CFR) — was 
1.1% (T. W. Russell et al. Preprint at medRxiv 
http://doi.org/dqrk; 2020), lower than the 3.8% 
estimated by the World Health Organization. 

The agency divided China’s total number of 
deaths by the number of confirmed infections, 


says Timothy Russell, an epidemiologist at the 
London School of Hygiene and Tropical Medi- 
cine. This does not take into account that only 
a fraction of infected people are tested, and 
makes the disease seem more deadly than it 
is, he says. 

Russell and his colleagues used data from 
the ship — where almost everyone was tested, 
and all 7 deaths recorded — and compared it 
with more than 72,000 confirmed cases in 
China, making their CFR estimate more robust. 

The group also estimates that the infection 
fatality rate (IFR) in China — the proportion of 
allinfections, including asymptomatic ones, 
that result in death — is even lower, at roughly 
0.5%. The IFR is especially tricky to calculate in 
the population, because some deaths go un- 
detected if the person didn’t showsymptoms. 

The IFR helps public-health officials to 
understand disease severity and how to inter- 
vene, says Marc Lipsitch, an infectious-disease 
epidemiologist at the Harvard T.H. Chan School 
of Public Health in Boston, Massachusetts. 


RARE OZONE HOLE 


OPENS OVER THE 


ARCTIC — ANDIT'S BIG 


Cold temperatures created the hole, which 
is about three times the size of Greenland. 


By Alexandra Witze 


vast ozone hole — probably the 
biggest on record in the north — has 
opened in the skies above the Arctic. 
It rivals the better-known Antarctic 
ozone hole that forms in the Southern 
Hemisphere each year. 


Record-low ozone levels currently stretch 
across much of the central Arctic, covering 
an area about three times the size of Green- 
land (see ‘Arctic opening’). The hole doesn’t 
threaten people’s health, and will probably 
disappear in the coming weeks. But it is an 
extraordinary atmospheric phenomenon that 
will go down in the record books. 


ARCTIC OPENING 


A rare and record ozone hole has formed over the Arctic. An opening in the ozone layer appears 
each spring over the Antarctic, but the last time this phenomenon was seen in the north was in 2011. 


23 March 2019 
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23 March 2020 


Total ozone (Dobson units) 


SOURCE: NASA OZONE WATCH 


JESSICA RENE BOCK PAEZ 


“From my point of view, this is the first time 
you can speak about a real ozone hole in the 
Arctic,’ says Martin Dameris, an atmospheric 
scientist at the German Aerospace Center in 
Oberpfaffenhofen. 

Ozone normally forms a protective blanket 
inthe stratosphere, about 10 to 50 kilometres 
above the ground, where it shields life from 
solar ultraviolet radiation. But each year in the 
Antarctic winter, frigid temperatures allow 
high-altitude clouds to coalesce above the 
South Pole. Chemicals, including chlorine 
and bromine, which come from refrigerants 
and other industrial sources, trigger reactions 
on the surfaces of those clouds that chew 
away at the ozone layer. These cold condi- 
tions are much rarer in the Arctic, which has 
more-variable temperatures and isn’t usually 
primed for ozone depletion, says Jens-Uwe 
Groofg, an atmospheric scientist at the Jiilich 
Research Centre in Germany. 

But this year, powerful westerly winds 
flowed around the North Pole and trapped cold 
air in a ‘polar vortex’. There was more cold air 
above the Arctic than in any winter recorded 
since 1979, says Markus Rex, an atmospheric 
scientist at the Alfred Wegener Institute in 
Potsdam, Germany. In the chilly tempera- 
tures, the high-altitude clouds formed, and 
the ozone-destroying reactions began. 


Balloon measurements 


Researchers measure ozone levels by 
releasing weather balloons from observing 
stations around the Arctic. By late March, 
these balloons had measured a 90% drop in 
ozone at an altitude of 18 kilometres, which 
is right in the heart of the ozone layer. Where 
the balloons would normally measure around 
3.5 parts per million of ozone, they recorded 
only around 0.3 parts per million, says Rex. 
“That beats any ozone loss we have seen inthe 
past,” he notes. 

The Arctic experienced ozone depletion 
in 1997 and in 2011 (G. L. Manney et al. Nature 
478, 469-475; 2011), but this year’s loss looks 
on track to surpass those. “We have at least 
as much loss as in 2011, and there are some 
indications that it might be more than 2011,” 
says Gloria Manney, an atmospheric scientist 
at NorthWest Research Associates in Socorro, 
New Mexico. She works with a NASA satellite 
instrument that measures chlorine in the 
atmosphere, and says there is still quite a bit 
of chlorine available to deplete ozone in the 
coming days. 

The Arctic ozone hole isn’t a health threat 
because the Sun is only just starting to rise 
above the horizon in high latitudes, says Rex. 
However, inthe coming weeks, there is a small 
possibility that the hole will drift to lower lat- 
itudes over more populated areas — in which 
case, people might need to apply sunscreen 
to avoid sunburn. “It wouldn't be difficult to 
deal with,” Rex says. 


TOUGH CHOICES LOOM 
FOR RESEARCHERS 
WORKING WITH ANIMALS 


Cull, release or relocate: scientists are struggling 
to protect their research and their lab animals. 


By Anna Nowogrodzki 


he eggs were close to hatching, but 
Vivian Paez wasn’t sure they would sur- 
vive. She and her husband Brian Bock, 
both herpetologists, were incubating 
nearly 100 temperature-sensitive 
turtle and tortoise eggs in their laboratory 
at the University of Antioquia in Medellin, 
Colombia. By 17 March, they realized that a 
lockdown due to COVID-19 was imminent. 
The next day, as the university shut down 
all ofits research and teaching activities, Bock 
and Paez carefully moved all of the eggs into 
their garage at home. They placed them in plas- 
tic containers on Bock’s workbench, covered 
them with a tarpaulin and held their breath. 
Researchers everywhere are facing difficult 
decisions over what to do with research organ- 
isms amid lockdowns, university closures and 
shelter-in-place orders. Some scientists are 
able to care for animals in their usual facil- 
ities, with animal-care workers taking extra 
precautions for social distancing. Others, 
like Bock and Paez, have taken animals home 
or re-released wild-caught specimens. And 
many creatures have been, or will be, killed, 
particularly small animals such as mice. 


Life-and-death decisions 


The choices are particularly hard for scientists 
whose work directly affects human patients. 
Maria Eugénia Duarte, research chief at the 
National Institute of Traumatology and 
Orthopedics in Rio de Janeiro, Brazil, over- 
sees studies on rare and malignant sarcomas, 
mostly in children. Her team cares for roughly 
100 immunocompromised mice, which have 
been implanted with patient tumours to study 
how these grow and how best to treat them. 
With Rio on lockdown, only one researcher 
can go into the animal facility per day. Duarte 
herself can’t, because she’s over 60. Her lab 
members take turns spending 12 hours inthe 
lab feeding the mice, cleaning and sterilizing 
cages, and checking onthe animals’ health. But 
if equipment breaks, suchas the machine used 
to sterilize the cages, no one will be able to fix 
it. “We don’t know how long this is going to be 
possible,” Duarte says. “Maybe we will need to 
prioritize and sacrifice [some of] the animals.” 
Many labs have already taken this difficult 
decision. One researcher at Oregon Health & 
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A tortoise hatches, shortly after relocation. 


Science University has had to euthanize more 
than two-thirds of her mice. Elsewhere in the 
United States, a researcher at Carnegie Mel- 
lon University reports culling 600 mice; two 
scientists at Harvard say they have had to kill 
half of their research mice; and a team at the 
Memorial Sloan Kettering Cancer Center has 
been asked to designate no more than 60% of 
its animals as essential. 

The Jackson Laboratory, a non-profit bio- 
medical research institute based in Bar Har- 
bor, Maine, that sells millions of research mice 
per year, has noticed a several-fold increase in 
requests to freeze mouse sperm or embryos so 
that specific lines can be re-established later, 
says Rob Taft, a senior programme manager at 
Jackson. The institute has sent trucks to vari- 
ous cities to collect mice for cryopreservation; 
more pickups are planned. 

But for some labs, particularly those that 
use wild-caught research organisms, there are 
few options when it comes to maintaining or 
preserving a research programme. Solomon 
David, a fish biologist at Nicholls State Univer- 
sity in Thibodaux, Louisiana, decided last week 
to re-release 48 wild spotted gar (Lepisosteus 
oculatus) that his team had recently collected. 

As for Paez and Bock’s turtles, about 15 eggs 
have hatched so far, and the animals are liv- 
ing with the family until travel restrictions are 
lifted and they can be returned to their wild 
habitats. “At least we don’t work with jaguars 
or crocodiles,” Paez says. 
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AS atreaty to protect life in the open ocean 
nears completion, scientists applaud the 
new pact and worry about provisions that 
could hamper research. By Olive Heffernan 


n 1945, a young chemist called Werner 
Bergmann was diving off the Florida 
coast, scouring its waters for undiscovered 
marine life. One of the species he came 
across was a rather plain brown sponge. A 
colleague named the new-found creature 
Cryptotethia crypta, and Bergmann iso- 
lated from it two unknown compounds — 
spongothymidine and spongouridine. 

He suspected they could have medical uses, 
but their true value didn’t become apparent 
for more than 40 years. In 1987, the US Food 
and Drug Administration approved the first 
therapy for HIV; that drug, called azidothy- 
midine (AZT), was modelled on the sponge 
compounds that Bergmann had identified. 
By 1989, AZT had become the most expensive 
drug known, at US$8,000 per patient per year, 
generating more than $100 million a year in 
profits for the drug company. 

Eight other natural marine products have led 
to clinically approved drugs and another 28 are 
in clinical trials. Projections suggest that the 
global marine biotechnology market — which 
includes products for the pharmaceutical, bio- 
fuels and chemical industries — could reach 
$6.4 billion by 2025. There’s even a chance 
that a marine organism could help to combat 
viruses, such as the one responsible for the 
current pandemic; a compound isolated from 
red algae has shown promise in tests on differ- 
ent types of coronavirus (see A. Zumla et al. 
Nature Rev. Drug Discov. 15, 327-347; 2016). 
Commercial interest in the genetic resources 
of the high seas has never been greater. 

It has also never been more divisive. In 
the next few months, barring delays caused by 
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the COVID-19 pandemic, nations are expected 
to strike a historic deal to protect marine life 
in the high seas — the ocean beyond national 
governance. This region accounts for 90% of 
Earth’s available living space, and isthought to 
be home to millions of undiscovered species. 
For the deal to go ahead, nations must agree 
toasystem for creating large marine sanctuar- 
ies on the high seas and must lay out rules for 
how industry operates in these waters. But by 
far the most contentious issue they will tackle is 
howto regulate the use of the genetic resources 
of the high seas — both the marine creatures 
themselves and their gene sequences. The goal 
is to prevent ‘biopiracy’ — attempts by wealthy 
nations or companies to commercialize bio- 
logical resources without sharing the benefits 
with their rightful owners. Inthe case of the high 
seas, those owners are all nations. 
Researchers are overjoyed by the prospects 
of ahigh-seas treaty, but they are worried that 
efforts to prevent biopiracy will curtail their 
ability to do basic research inthe open ocean. 
It’snot anidle concern. Although almost all 
details of the treaty have yet to be agreed, the 
draft text includes several ideas that would 
change how high-seas research happens. Most 
notable are proposals that scientists would 
need to notify the United Nations before con- 
ducting research cruises in the high seas, or 
that they would need to obtain permits for 
such work, which would require them to share 
data or other benefits from their research. 
Most scientists are keen to share benefits 
with developing nations and Indigenous 
groups, but they do not favour constraints 
on research. Some fear that the proposed 
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anti-biopiracy regulations will mirror those of 
the Convention on Biological Diversity, most 
notably the Nagoya Protocol, an international 
agreement adopted in 2010 that restricts scien- 
tists’ access to the territories of other nations, 
including their coastal waters. Nations drafted 
the Nagoya Protocol to prevent companies 
from patenting Indigenous medicines without 
sharing the profits, and nowsome researchers 
say ithas madeit difficult to get permits to work 
in some developing nations. 

“I’m delighted that the UN is undertaking 
this effort as a way of trying to ensure conser- 
vation and appropriate oversight of the high 
seas,” says Peter Girguis, an ocean scientist and 
evolutionary biologist at Harvard University 
in Cambridge, Massachusetts. But Girguis 
says he is “hugely concerned that we'll find 


DAVID SHOLE/NPL 
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Anew treaty will govern uses of organisms from the open ocean, such as this hydromedusa. 


THE GLOBAL MARINE 
BIOTECHNOLOGY 
MARKET COULD 


ourselves hindering access for everybody to 
do academic research’. 


Final stretch 


Conservationists and scientists have pushed for 
ahigh-seas treaty for more than a decade, and 
they are now entering the homestretch. Nego- 
tiators were scheduled to start the fourth and 
final round of talks on 23 Marchin New York, but 
that meeting has been postponed until further 
notice because of the COVID-19 pandemic. 
Thetreaty would closea giant gap in the exist- 
ing network of international and national laws. 
Countries have exclusive rights to fish and mine 
in waters up toa distance of 200 nautical miles 
from their shores. Beyond that are the high seas. 
Right now, certain activities on the high seas, 
such as mining and cable laying, are regulated 
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by the UN Convention on the Law of the Sea, 
but thereis no lawto protect marine life in this 
vast region. 

Uptonow, some 34,000 marine natural prod- 
ucts have been identified that could potentially 
be used in medicine, food and cosmetics. Of the 
eight existing marine drugs, five are cancer 
treatments. With the global marine biotech- 
nology market growing rapidly, concern has 
mounted about ownership of these resources. 
At present, it’s possible for anyone to develop 
and profit froma product derived from biolog- 
ical samples taken in the high seas, and some 
developing nations are concerned that wealthy 
nations or companies will reap most of the prof- 
its to be made from this global commons. 

Already, 12,998 genetic sequences from 
marine species have been patented. The 
multinational chemical giant BASF, based in 
Ludwigshafen in Germany, has registered 47% 
of those gene sequences in patents — a figure 
that Robert Blasiak, an ocean-governance 
researcher at the Stockholm Resilience Centre 
in Sweden, and his colleagues say represents 
a worrying trend of corporate control over 
marine genetic resources. A sequence from 
an alga, for example, has been used to fortify 
canola oil, from the rapeseed plant, with 
omega-3 fatty acids. 

When nations meet to thrash out the treaty, 
they will have to decide whether the new law 
to prevent biopiracy covers physical samples 
only, suchas an alga and its DNA — or whether 
it extends to digital sequence information, 
such as a gene sequence from an alga stored 
ina data repository. 

They will also have to consider two other 
issues related to biopiracy: how to ensure 
equal access to marine genetic resources and 
howto share benefits from them. These provi- 
sions would parallel the protections adopted 
through the Nagoya Protocol. Developing 
nations pushed for the protocol out of concern 
that companies were patenting Indigenous 
medicines without sharing the profits. 

One example involves the Madagascar 
periwinkle, Catharanthus roseus, which has 
been used for centuries as a medicine in 
Africa and China. Compounds from the plant 
and their derivatives are now ingredients of 
numerous medications patented and sold by 
large pharmaceutical companies. So far, the 
provisions included in the Nagoya Protocol 
have led to one profit-sharing arrangement, 
for South African rooibos tea. 

Nations hope to strike a high-seas deal this 
year, but there are still deep philosophical 
divides. Countries such as Russia, the United 
States and Japan, which have the technologi- 
cal and financial clout to scour the deep seain 
search of new drugs, cosmetics and food prod- 
ucts, are advocating a ‘free seas’ mentality that 
favours unrestricted access, patent protection 
and sharing of non-financial benefits such 
as data. Developing nations, typified by the 
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Group of African States (the African Group), 
argue that marine genetic resources are ‘com- 
mon heritage’ and need some oversight so that 
their use can be monitored and any profits, as 
well as other benefits, shared. “If there’s almost 
no form of regulation, there wouldn’t be any 
opportunities for us to track and trace when 
there is commercialization,” says Michael 
Kanu, deputy permanent representative to 
the UN for Sierra Leone, and coordinator of 
the African Group at the treaty talks. 
Christian Tiambo, a livestock scientist at 
the International Livestock Research Insti- 
tute in Nairobi, agrees. He says that develop- 
ing nations and Indigenous people should be 
worried about biopiracy, and that it’s very 
important to regulate access to the high seas 
to prevent biopiracy from happening there. 


Global permit scheme 


Just what those regulations would look like is 
up for discussion, but the draft text includes 
several ideas. One is to create a global body 
that would authorize, and possibly even grant 
permits to, scientists to undertake research on 
life in the high seas — a first for researchers. 
Analternative idea is for scientists to submit 
their post-cruise data, research findings and 
sporadic progress reports toacommittee ora 
platform created by the UN. There is also a pro- 
posal to assign unique identifiers to all marine 
genetic resources on collection, allowing their 
use to be tracked. 

Siva Thambisetty, who studies patents and 
biotechnology at the London School of Eco- 
nomics, says that these options essentially 
follow two different paths. A light-touch 
approach would require researchers and com- 
panies to give notification of their research 
plans and voluntarily share any benefits, such 
as data. Amore tightly regulated scheme would 
grant permits to scientists for access to the high 
seas in exchange for their sharing benefits, such 
as data or any profits made from new products. 

Thambisetty says she favours conditional 
permits, rather than a system that assumes 
scientists will be given approval and encour- 
aged to share benefits voluntarily. She says 
that granting scientists exclusive rights to data 
for a short period, perhaps one or two years, 
might bea fair exchange for a permit. 

Although researchers accept the idea of 
some controls, they worry that certain ones 
could be too onerous. 

Muriel Rabone, for example, a curator and 
ecologist at the Natural History Museum in 
London, recognizes problems with the cur- 
rent system but has concerns about changes. 
“It’s not good for the science community to 
have this big north-south divide in terms 
of research capacity,” she says, adding that 
“we need things that are going to streamline 
processes rather than hamper them”. 

“The idea that approval would be given by 
an overseeing body before a cruise is allowed 
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WE NEED THINGS 
THAT ARE GOING 
TO STREAMLINE 
PROCESSESRATHER 
THAN HAMPER THEM: 


Cancer drugs are derived from this tunicate. 


throws up alot of questions: who’s approving 
this, how and why? What sort of bottleneck is 
that going to create?” she says. 

Scientists are wary because similar 
anti-biopiracy laws — and the Nagoya Protocol 
in particular — have hampered foreign research- 
ers from gaining access to certain countries, 
suchas Colombia and SriLanka. “A lot of the bio- 
diversity research community has been alittle 
bit bruised by Nagoya,” says Rabone. Shirley 
Pomponi, a marine biodiscovery researcher 
at Florida Atlantic University Harbor Branch 
in Fort Pierce, Florida, says that before access 
and benefit-sharing laws came into place, her 
team collected samples from around the world. 
But she has now had to stop working in some 
countries, suchas Brazil and Colombia. 

“It just got to be harder and harder,” she says. 
“We would be days away from an expedition 
that was going to cost us hundreds of thou- 
sands of dollars and still not have permits from 
the countries to be able to bring our ship into 
their waters. And it’s just not worth the hassle. 
So we thought, ‘let’s just focus on the US” 
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Althoughsomescientists say that the Nagoya 
Protocol has restricted their work, Tiambo says 
he already sees many benefits coming out of the 
agreement. Scientists are now being trained to 
better understand the value of genetic informa- 
tion, he says, and “this information is trickling 
down to local communities, who can now 
really take advantage of the genetic resources 
that they have been keeping for generations”. 
Researchers working on dairy-cattle genomics, 
for example, have shared data and expertise 
with African scientists and communities, which 
has allowed them to improve their national 
breeding programmes. 

Rachel Wynberg, a bio-economics expert 
at the University of Cape Town, South Africa, 
agrees that anti-biopiracy laws, including the 
Nagoya Protocol, have had benefits. “There 
has definitely been a shift in perception and in 
the ethics of working with biodiversity. There 
has also been a significant shift in company 
practices,” she says. But she questions whether 
the Nagoya Protocol has had any meaningful 
impact on economic development, conserva- 
tion and Indigenous people. 


Balancing act 


Despite the concerns, many seea way to craft an 
agreement that both restricts biopiracy and fos- 
ters research. If, for example, a unique identifier 
is assigned to each sample, then ifa productis 
developed, a share of profits will go into a pot 
that could be shared between nations for use 
in biodiversity conservation. “This would allow 
for full traceability of materials all the way from 
the ocean floor to commercialization,” says 
Marcel Jaspars, a biodiscovery researcher at 
the University of Aberdeen, UK, whois advising 
the UN on howto design the treaty. 

Another possibility that’s been floated is 
that the treaty could support, rather than 
restrict, access to the high seas, treating 
access as a benefit. Scientists from develop- 
ing countries could join research cruises with 
other nations, finding available berths on ships 
through a global registry of research cruises. 
“This could promote access to the high seas 
by all scientists who are interested, ensuring 
that those scientists are there when discover- 
ies are made,’ says Girguis. Scientists fromthe 
developing world would then also havea share 
of patents arising from that research. 

Rather than resisting change, marine scien- 
tists need to step up to the mark, and accept 
the need for new research protocols, says 
Thambisetty. 

Now is the time to engage, say researchers 
who have followed the negotiations. “If we get 
it right, this treaty could be transformational,” 
says Jaspars. “We could actually end up with 
more knowledge about the deep oceans than 
we had before.” 


Olive Heffernan is a science journalist in 
Dublin. 
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A 3D magnetic resonance imaging scan of the brain. 


Neuroscience needs 
some new ideas 


A history of the metaphors behind brain research faces 
a dark past and disquieting future. By Stephen Casper 


he poet Emily Dickinson rendered the 
brain wider than the sky, deeper than 
the sea, and about the weight of God. 
Scientists facing the daunting task of 
describing this organ conventionally 
conjure up different kinds of metaphor — of 
governance; of maps, infrastructure networks 
andtelecommunications; of machines, robots, 
computers and the Internet. The compari- 
sons have been practical and abundant. Yet, 
perhaps because of their ubiquity, the meta- 
phors we use to understand the brain often go 
unnoticed. We forget that they are descriptors, 
and see them instead as natural properties. 
Such hidden dangers are central to biolo- 
gist and historian Matthew Cobb’s The Idea of 
the Brain. This ambitious intellectual history 
follows the changing understanding of the 
brain from antiquity to the present, mainly 
in Western thought. Cobb outlines a grow- 
ing challenge to the usefulness of metaphor 
in directing and explaining neuroscience 


research. With refreshing humility, he 
contends that science is nowhere near work- 
ing out what brains do and how — or even if 
anything is like them at all. 

Cobb shows how ideas about the brain have 
always been forged from the moral, philosoph- 
ical and technological frameworks to hand for 
those crafting the dominant narratives of the 
time. In the seventeenth century, the French 
philosopher René Descartes imagined an 
animal brain acting through hydraulic mech- 
anisms, while maintaining a view of the divine 


The Idea of the Brain: 
A History 

Matthew Cobb 
Profile (2020) 


THE IDEA OF 
THE BRAIN 


A History 


MATTHEW COBB 
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nature of a mind separate from matter. Later 
authorities, such as the eighteenth-century 
physician and philosopher Julien Offray deLe 
Mettrie, secularized the image and compared 
the humantoa machine. The Italian physicist 
Alessandro Volta rejected the idea of ‘animal 
electricity’, proposed by his rival Luigi Galvani 
as a vital force that animates organic matter. 
Volta was driven at least partly by his aversion 
to the mechanistic view. 

New metaphors came from nineteenth-cen- 
tury phrenology, evolutionary theory and the 
doctrine of inhibition in physiology — the idea 
that the nervous system could repress actions 
and behaviours. Then came the age of commu- 
nication, and with it fresh language for the brain. 


Image clash 


The late-nineteenth-century discovery 
of neurons led to a clash of rival images. 
Reformers imagined separate components, 
comparable to the wires and signals of the 
nascent telecommunications infrastructure. 
Conservatives cast the nervous system as a 
continuous network (or reticulum) akintothe 
blood circulation, feeling that this explained 
how volition and mind might work; to them, 
discrete signalling implied heterodox notions 
of mind, perhaps even of the soul. 

The post-1940 proliferation of references to 
enchanted looms, ghosts in machines, logical 
circuits, reptile brains, parallel processors and 
uploaded minds grew from those foundations. 
Cobb notes, but only in passing, that we need 
new images to make sense of research devel- 
opments ranging from artificial intelligence 
to mini-brains grown in the laboratory to brain 
implants. He doesn’t try to invent examples. 

The narrative Cobb offers is familiar. The epis- 
temic power of metaphors in science has long 
been recognized by historians and philosophers 
of science. Yet for the popular audience he tar- 
gets, Cobb’s account is an important contribu- 
tion: few have offered such accessible insights, 
with choice examples and clear explanations 
of the societal factors that lie beneath. Cobb 
also eloquently shows how figurative language 
does much morethan simply distil or give shape 
to complex, intangible subjects. Metaphors 
change how science is done, by licensing new 
interpretations or inspiring new experiments. 

Cobb also reminds us that metaphors con- 
cealas muchas they reveal. The ideas that they 
so persuasively represent often ignore key 
elements. Comparing the brain to acomputer 
is beguiling, but neglects that brains are also 
organs, and aware ones at that. Our existing 
images and language are desperately limit- 
ing when it comes to imagining a situation in 
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Books in brief 


MICHAEL BOND 


Wayfinding 


The Art and Science of 
How We Find and Lose Our Way 


a 


Wayfinding 

Michael Bond Picador (2020) 

This rewarding meditation on “how we find and lose our way” might 
have been called “Am | here?” — the tragic refrain of science writer 
Michael Bond’s grandmother after she developed dementia. The 
book astonishes as it ranges from the neuroscience of meandering 
rats to the deleterious effects of satellite navigation. A desert ant, we 
learn, can forage at least 100 metres from its nest, then scurry back 
in a straight line — equivalent to a human wandering for a day and a 
night, then heading straight home without help from GPS. 


ACC@RDING 


The World According to Physics 

Jim Al-Khalili Princeton Univ. Press (2020) 

Quantum physicist, historian and science broadcaster, Jim Al-Khalili 
is well placed to summarize the past, present and future of physics 
for a lay audience, without using mathematics. After a tantalizing 
chapter on scale, he analyses space, time, energy, matter, quanta, 
thermodynamics and various attempts to unify the general theory 
of relativity with quantum field theory — although he never defines 
a black hole. On the debate between Niels Bohr and Albert Einstein, 
Al-Khalili sides with Einstein, who believed in an objective reality. 


Kingdom of Frost 

Bjorn Vassnes (transl. Lucy Moffatt) Greystone (2020) 

Science journalist Bjorn Vassnes’s brief book demonstrates how “life's 
different revolutions have been intertwined with the history of the 
cryosphere”. He includes memories of digging tunnels to his house 

in the Norwegian Arctic during snowy 1970s winters, and experience 
of Bangladesh, which never sees snow yet survives on river water 
from threatened Himalayan glaciers. Vassnes discusses how reindeer 
grazing eradicates vegetation that reduces the Arctic’s heat-deflecting 
albedo effect; perhaps it could aid the fight against global warming? 


A NATURALIST 
IN THE AMAZON 


A Naturalist in the Amazon 

Henry Walter Bates Natural History Museum (2020) 

“The best book of Natural History Travels ever published in England,” 
said Charles Darwin of entomologist Henry Walter Bates’s 1863 The 
Naturalist on the River Amazons, an 11-year journal inspired partly 

by Darwin's diary of his 1831-36 journey on the HMS Beagle. This 
enchanting part-facsimile justifies his words. Bates writes grippingly on 
anacondas, bird-killing spiders and blowpipes. Although little-known 
now, his name endures in ‘Batesian mimicry’: a survival strategy based 
on apeing harmful species, which he observed in butterflies. 


Lucean Arthur Headen 

Jill D. Snider Univ. North Carolina Press (2020) 

There are no references to Lucean Arthur Headen on Wikipedia; nor 
did he leave behind significant personal papers. Yet this black inventor 
and entrepreneur, born in racially segregated North Carolina in 1879 
among formerly enslaved artisans, deserves study. Local historian Jill 
Snyder’s biography reconstructs him. By his death in 1957, 26 years 
after moving to Britain, Headen had spent almost 4 decades running 
US and UK companies making cars and products based on his patents 
— some of which are still cited. Andrew Robinson 


24 | Nature | Vol 580 | 2 April 2020 


© 2020 Springer Nature Limited. All rights reserved. 


which the mental, physical and embodied are 
so tightly enmeshed. 

Thus, despite their power, our metaphors 
have done little to bridge the divisions that 
emerge as scientists seek to understand 
what brains are. After centuries of research, 
including recent advances in exploring con- 
sciousness through imaging techniques such 
as functional magnetic resonance imaging, 
there’s still no answer to Shakespeare’s ques- 
tionin The Merchant of Venice — “Tell me where 
is fancy bred, Or in the heart or in the head?” 

We can’t stop using metaphors. Scientists 
depend on figurative language to organize 
and communicate thoughts and ideas. But 
whether the neurosciences can get closer to 
a compelling idea of the brain in the decades 
ahead might depend ona full reckoning of the 
role of metaphors. Top of the list: research- 
ers should acknowledge that although certain 
word choices seem innocent, many carry 
malign overtones. Ideas of the brain have often 
embedded inequities and prejudices about 
race, class, gender, sexuality and agency. 

On these matters, Cobb should have said 
more. The word ‘racist’ appears only a few 
times in his book, and then only in footnotes. 
But alittle thought makes clear that seemingly 
innocent metaphors like ‘higher’ and ‘lower’ 
functions, or descriptions of specific anatom- 
ical structures as ‘primitive’, carry racialized 
baggage. When originally characterized, 
they spoke to the ghastly view that the nerv- 
ous systems of white, upper-class men made 
them evolutionarily superior to those they 
subordinated at home and abroad. Similarly, 
it is discomfiting to realize that Broca’s area, 
linked to language processing, is named for the 
French physician Paul Broca, who believedina 
hierarchy of peoples. That, in 2020, there are 
scientists who still talk about ‘female brains’, 
an idea Cobb rightly derides, is evidence that 
gender (a word that appears only in the bibli- 
ography) remains central to too many people’s 
ideas of how the brain is constructed. And he 
makes no mention of what neurodiversity 
advocacy might mean for figurative language. 
Whatever new metaphors are to come, ones 
that embrace differences inclusively will be 
more insightful and more profound. 

The Idea of the Brain puts our current 
predicament in context and synthesizes 
much that needs attention. It isa very good 
book. It could have done more ina time when 
science is coming to terms with the limita- 
tions of the straight, white, wealthy, Western, 
non-disabled, male perspective. But I hope it 
provokes contemplation about why certain 
metaphors linger, where they come from, how 
they persist, and in what ways they burden us 
with the invisible assumptions of past cultures. 


Stephen Casper is professor of history at 
Clarkson University, Potsdam, New York, USA. 
e-mail: scasper@clarkson.edu 
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The elongated bristlemouth (Sigmops elongatus) is abundant in the oceans’ twilight zone. 


Study the twilight zone 
before itis too late 


Adrian Martin, Philip Boyd, Ken Buesseler, lvona Cetinic, Hervé Claustre, 
Sari Giering, Stephanie Henson, Xabier Irigoien, Iris Kriest, Laurent Memery, 
Carol Robinson, Grace Saba, Richard Sanders, David Siegel, Maria Villa Alfageme 


& Lionel Guidi 


Exploitation and degradation 
of the mysterious layer 
between the sunlit ocean 
surface and the abyss 
jeopardize fish stocks and 
the climate. 
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he twilight zone contains the largest 
and least exploited fish stocks of the 
world’s oceans. Spanning from just 
below 200 metres to 1,000 metres 
deep, it is an interface between the 
well-studied marine life in the sunlit zone 
above and the ecosystems of the abyss below. 
Ithas a major role in removing carbon dioxide 
from the atmosphere and storing it for centu- 
ries or longer. The twilight zone is also privy to 
the largest migration on Earth. Huge numbers 
of fishes and zooplankton move hundreds of 
metres towards the surface each night to feed, 
before retreating back down at dawn. 
Yet the zone is poorly understood — phys- 
ically, biogeochemically and ecologically. 
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Even the number of organisms that live there 
remains a mystery, let alone their diversity and 
function. 

It is alarming, then, that this vast ocean 
domain is at risk in three ways — even before 
any of the potential consequences are under- 
stood. First, the world’s growing population 
has an increasing need for food. Second, sea- 
floor mining for minerals and metals could 
release waste into the region’. And third, 
climate change is altering temperature, acid- 
ification and oxygen levels in ways that are 
likely to affect life there’. 

The twilight zone is hard to study. Its 
organisms are difficult to sample and ana- 
lyse, being sparsely distributed, elusive and 
often fragile. They also live at pressures of up 
to 100 atmospheres, which poses problems 
for laboratory-based investigations. 

Critics might argue that waters near coasts 
and above shelves are more deserving of study, 
given the huge environmental pressures there, 
as well as their importance to societies. And, of 
course, they need attention. Sadly, however, 
it is too late to avoid widespread environmen- 
tal damage to these inshore regions. Instead, 
research efforts and local policies must aim at 
mitigating the worst effects. 

By contrast, the twilight zone is almost 
pristine. Moreover, the majority of it lies 
beyond national jurisdiction. This makes it 
of common interest and responsibility, and 
means that global agreement is necessary to 
manage it. 

Here, we outline the steps needed to ensure 
that enough is known about this complex 
global ecosystem to inform decisions about the 
impacts of climate change and potential future 
exploitation. We call on the international 
marine research community to focus its atten- 
tion onthe twilight zone during the upcoming 
United Nations Decade of the Ocean, which 
runs from 2021 to 2030. Inthe spirit of the UN’s 
Sustainable Development Goals, we should 
seize the opportunity to establish a global 
policy that will protect this vast ecosystem 
for present and future generations. 


Carbon pump 


At present, we know just enough about the 
twilight zone to recognize its importance in 
maintaining a healthy ocean. 

Phytoplankton growing in the sunlit layer 
fuel multiple food-supply routes into the zone 
that sustain organisms from bacteria to giant 
squid. In the process of consuming this food, 
and each other, the twilight-zone animals pro- 
duce CO,, consume oxygen and release nutri- 
ents back into the water (see ‘Twilight zone’). 
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Inthe winter, cold, windy weather mixes water 
containing the recycled nutrients with water 
fromthe surface layer‘. In this way, the twilight 
zone has animportant role in supporting phy- 
toplankton growth the next spring. 

Although winter mixing can release carbon 
back into the atmosphere, a fraction of it ends 
up in deeper waters, where it can be locked 
away, typically for centuries. This downward 
transport of organic matter, mediated by life 
in the twilight zone, is called the biological 
carbon pump, and the twilight zone is central 
to its strength>. This deeper flux of material 
becomes food for the animals there. The 
small amount that eventually reaches the sea 
floor sustains everything from bacteria to sea 
cucumbers. 


Glaring gaps 
Unfortunately, little is known about how the 
twilight zone performs these roles. This makes 
it difficult to predict future ocean oxygen levels 
or howorganic carbon will be stored inthelong 
term. Moreover, the effects of climate change 
on ocean temperatures and oxygen levels will 
alter how the biological pump operates. 
Knowledge gaps range from fundamen- 
tal information, such as what species dwell 
there and what their metabolic rates are, to 


TWILIGHT ZONE 


how they behave and adapt to their environ- 
ment. Bacteria colonize ‘marine snow’ (sink- 
ing aggregates of organic material); krill form 
dense, localized swarms; and some fishes 
have evolved vision that is tuned to dawn and 
dusk. But how do such adaptations affect the 
functioning of the twilight zone? 


“The twilight zone is 
privy to the largest 
migration on Earth.” 


The patchiness of the information makes it 
hard to predict how the twilight zone might 
respond to human pressures. For example, 
the fishing industry is likely to target species 
that are very abundant, suchas elongated bris- 
tlemouth (Sigmops elongatus), but will also 
remove many other animals in the process. 
This could reduce the ecosystem’s resilience 
and change global nutrient and carboncycles°. 


Research priorities 


The following three questions should be prior- 
itized to plug these knowledge gaps. 


How many organisms live in the twilight 
zone, and how diverse are they? Estimates 


Animals here influence the recycling of nutrients and long-term storage of carbon 
in the ocean, but little is known about them and what they do. Climate change and 
human exploitation are likely to change these functions. 
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for the volume of fishes there range between 
1billion and 20 billion tonnes’. Surface waters 
are estimated to hold 1 billion tonnes. (The 
world’s human population has a total weight 
of 0.5 billion tonnes.) But it is not clear what 
fraction of the organisms are siphonophores 
(relatives of jellyfish) or cephalopods (such 
as squid). 


Which ecological processes transform and 
consume organic material? Marine snow is 
common in the twilight zone. Zooplankton 
could be breaking it up so that it forms a 
slow-sinking substrate for bacteria — a nutri- 
tious food for the zooplankton*®. But this 
theory, known as microbial gardening, is yet 
to be tested. 


Howis organic material transported into and 
out of the twilight zone? Researchers need to 
determine the relative importance of arange of 
mechanisms that vary with location, time and 
depth’. These span from physical processes to 
animal behaviour. Ocean currents transport 
tiny particulate and dissolved organic matter 
to greater depths. Larger organic aggregates 
and faecal pellets sink. And daily and seasonal 
animal migrations release waste products at 
depth. 


Addressing these three questions will help 
to clarify what sets the balance between how 
much organic material is consumed in the 
twilight zone, restoring nutrients and sus- 
taining the fish stock, and how much passes 
on to greater depths, sequestering carbon 
away from the atmosphere”. Only with this 
knowledge can the wider consequences of 
exploiting the region be predicted. 


Three steps 


The following three steps will help in address- 
ing these research priorities. They make use 
of arange of innovative tools and techniques. 


Conduct a census. Organisms ranging from 
bacteria to large cetaceans need to be counted. 
Devices such as the Underwater Vision Pro- 
filer (UVP) can be deployed in the twilight 
zone to capture images of plankton that can 
be identified and counted using a web-based 
application known as EcoTaxa, whichis linked 
to a taxonomic database containing roughly 
100 million images of planktonic organisms 
and particles. A smaller version of the UVP 
can be attached to autonomous vehicles 
to extend sampling beyond the times and 
places that research vessels can visit. Larger 
organisms can be identified with short-range 
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high-frequency acoustic sensors. When 
deployed at depth, these sensors can help sort 
fishes from siphonophores. DNA harvested 
from the environment can also be used to 
infer the identity and diversity of elusive or 
fragile animals, including large fishes, marine 
mammals such as beaked whales and gelati- 
nous organisms. Any new approaches should 
be calibrated against conventional physical 
sampling methods and followinternationally 
recognized, standardized procedures. 


Determine what is processing and consum- 
ing organic material. To do this, changes in 
the size, source and sinking speed of organic 
aggregates need to be observed in situ as the 
particles descend through the water column. 
This should be done mainly using optical sen- 
sors. The information gleaned could then be 
combined with simultaneous estimates of the 
abundance of zooplankton and fishes and the 
intensity of ocean mixing, to determine what 
is breaking up the aggregates and retaining 
them in the twilight zone”. A range of ‘omics’ 
approaches should be used to provide insight 
into how the associated organic material is 
being eaten by microorganisms”, including 
metagenomics, metatranscriptomics and 
metabarcodes. 


Track organic material. Argo floats already 
roam the ocean and collect information on 
properties such as temperature, phytoplank- 
ton abundance and nutrient levels as they 
shuttle between the surface and a depth of 
2,000 metres every 10 days. Imaging systems 
capable of measuring the size and abun- 
dance of organic particles are being added 
to these floats. The current network needs 
to be increased from 200 to 1,000 operating 
floats, with imaging sensors added to all. Opti- 
cal sensors on other autonomous underwater 
vehicles such as gliders can be used to yield 
information on the size and shape of organic 
particles. 

These data could be combined with infor- 
mation from the Plankton, Aerosol, Cloud, 
ocean Ecosystem (PACE) mission, which NASA 
plans to launch in December 2022. The sat- 
ellite will use a spectrometer to measure the 
colour of the ocean. Those data will be useful 
in determining the types of phytoplanktonin 
the surface layer of the ocean, fuelling the twi- 
light zone. The discovery that laser-mapping 
technology suchas LIDAR (Light Detection and 
Ranging) can observe the daily migrations of 
zooplankton from space should be combined 
with sparse local data”. 


To obtain the most complete picture possible 
of the global twilight zone, we call on national 
and international ocean projects to coordinate 
efforts, rather than duplicate them. We encour- 
age researchers and institutions to link up with 
JETZON (Joint Exploration of the Twilight Zone 
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The Deep-See sensor platform heads for its first dive into the twilight zone. 


Ocean Network), an initiative launched ear- 
lier this year to improve communication and 
coordination. Currently, 15 projects involving 
12 countries are involved, each studying justa 
few locations. This is a good start. But given the 
vast size and complexity of the twilight zone, 
everyone, from independent researchers to 
international projects, needs tojoin forces to 
succeed. 

There is no time to waste. We cannot let 
climate warming and human exploitation 
fundamentally alter the twilight zone before 
we even begin to understand the potential con- 
sequences for the health of the planet. 
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Readers respond 


Correspondence 


Share mobile data to 
curb COVID-19 


Open sharing of clinical, 
epidemiological and virological 
data between governments and 
researchers during the current 
COVID-19 pandemic is shaping 
international public-health 
strategies. However, digital 
data from billions of mobile 
phones and footprints from 
web searches and social media 
remain largely inaccessible to 
researchers and governments. 
These data could support 
community surveillance, 
contact tracing, social 
mobilization, health promotion, 
communication with the public 
and evaluation of public-health 
interventions. 

We urge technology 
companies to work with 
researchers and governments 
to find ways to share their data 
rapidly in a legal, proportionate, 
ethical and privacy-preserving 
manner. The public’s consent 
to sharing personal data for the 
common good can be obtained 
dynamically through existing 
mobile applications, putting the 
public at the heart of the public- 
health response to COVID-19. We 
ask governments and funders 
to create new centres of digital 
public health to deploy and 
evaluate proven innovations. 

The technology sector has 
benefited from massive public 
investment in the Internet, the 
GPS and mobile technologies. 
Nowis the time for tech to invest 
in society. 


Predatory journals: 
dodging the radar 


Agnes Grudniewicz and 
colleagues highlight the need 
to define what constitutes a 
predatory journal (Nature 576, 
210-212; 2019). History shows, 
however, that such journals and 
their publishers rapidly adapt to 
filters that might discredit them. 
In their early days, such 
journals were ephemeral, with 
false claims of indexing, vague 
titles (such as /nternational 
Journal of Applied Sciences 
and Engineering), fraudulent 
publication fees and dubious- 
looking websites. By contrast, 
modern predatory journals use 
more specific titles and release 
well-designed issues. They 
have real indexing and well- 
developed websites. They are 
owned by supposedly legitimate 
organizations, publish for 
free (because they have other 
interests), run counterfeit 
archives and safeguard 
themselves with plagiarism 
checks (see F. H. Kakamad et al. 
Int.J. Surg. Open17, 5-7; 2019). 
However, the skipping or 
faking of scientific review 
remain cornerstones for 
predatory journals and 
publishers. In our opinion, it 
is dangerous to exclude the 
criterion of inadequate peer 
review from any definition 
of predatory journals, as 
Grudniewicz and colleagues 
propose, because that definition 
would then fail to catch its 
criminal targets. 
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College London, UK. 
r.a.mckendry@ucl.ac.uk 
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see go.nature.com/2ub8qjq 
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Predatory journals: 
tell-tale lax review 


Agnes Grudniewicz and 
colleagues argue for a definition 
of a predatory journal that will 
protect scholarship (Nature 
576, 210-212; 2019). Their 
proposed definition excludes an 
important feature of predatory 
journals — poor-quality peer 
review — onthe grounds that 
such reviews are not accessible 
for analysis. It is a sad irony that 
this lack of transparency — a tell- 
tale trait of predatory journals 

— should be used to justify 
omitting an assessment of peer- 
review quality. 

If misuse of the peer-review 
label is not included in the 
definition of predatory journals, 
it could strengthen rather than 
weaken them. Formal listings 
of those journals might shrink 
under sucha definition: many 
journals would be removed 
because their questionable 
peer-review procedures have 
escaped scrutiny and they seem 
otherwise respectable. They 
could then become attractive 
outlets to potential authors. 

As Grudniewicz and 
colleagues point out, legitimate 
journals that keep their peer- 
review processes under wraps 
encourage predatory practices. 
If publication of signed referees’ 
comments were standard, 
journals publishing unrefereed 
papers would quickly be 
exposed. In our view, therefore, 
open peer review should be 
compulsory and the definition 
of predatory journals should 
include the quality of peer 
review. 
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R6ntgen, Becquerel 
and radiation 


Last month marked the 175th 
anniversary of the birth of 
German physicist Wilhelm 
Conrad Roéntgen (1845-1923), 
who won the 1901 Nobel prize 
for his discovery of X-rays. His 
work is still a linchpin of modern 
science and medicine. 

ROntgen’s academic career 
had a less-than-propitious start. 
Wrongly accused of being the 
author of a caricature of his class 
teacher, he was expelled from 
high school in the Netherlands 
without graduating. His 
application to Utrecht University 
inthe Netherlands was rejected 
as aresult, but he went onto 
study mechanical engineering 
at the Federal Polytechnical 
School (now the Swiss Federal 
Institute of Technology) in 
Zurich. He was then rejected 
by Julius Maximilian University 
of Wiirzburg in Germany for 
a postdoctoral qualification, 
which he eventually secured at 
the University of Strasbourg, 
France. 

Despite this rejection, 
ROntgen later donated his Nobel 
Prize money to the University of 
Wurzburg. In another example 
of his philanthropy, he declined 
to patent his X-ray discovery, 
thereby making it available to 
the world. He also turned down 
the honour of anoble title. 

In 1903, French engineer 
Henri Becquerel was awarded 
the Nobel Prize in Physics, along 
with Marie and Pierre Curie 
(see also Nature 579, 490-491; 
2020), for their pioneering work 
on radioactivity. Becquerel was 
inspired by R6ntgen’s X-rays, 
which gave him insight into 
other forms of radiation, such 
as phosphorescence (see Nature 
78, 414-416; 1908). 


Andreas Otte Offenburg 
University, Germany. 
andreas.otte@hs-offenburg.de 


Nature | Vol 580 | 2 April2020 | 29 


© 2020 Springer Nature Limited. All rights reserved. 


Expert insight into current research 


News & views 


Organic chemistry 


Strong chemical reducing 
agents produced by light 


Radek Cibulka 


Anelectrically neutral radical has been found to bea potent 
chemical reducing agent when excited by light. Remarkably, it 
is produced froma positively charged precursor that has long 
been used as a strong excited-state oxidizing agent. See p.76 


When molecules absorb light, they enter 
an excited state and become more reactive 
than when in their ground state. Light energy 
can therefore be used to generate reactive 
molecules that undergo chemical transfor- 
mations that would otherwise be difficult to 
achieve. Several powerful oxidizing agents 
have been generated using light excitation, but 
strong reductants have been more difficult to 
produce. On page 76, MacKenzie etal.’ report 
the discovery of a light-generated molecular 
species that exhibits reducing properties com- 
parable to those of alkali metals — and whichis 
therefore one of the strongest known chemical 
reductants. 

Chemical reactions mediated by visible light 
are important tools in organic synthesis. These 
reactions occur analogously to light-driven 
biological processes such as photosynthesis 
— with the help of a light-absorbing catalyst. 
In photoredox catalysis’, an excited catalyst 
molecule exchanges a single electron witha 
reaction partner (the substrate). During this 
process, which is known as photoinduced 
electron transfer (PET), the substrate is 
transformed into a reactive free radical; this 
undergoes a subsequent reaction to give one 
or more final products. Such processes usually 
occur at ambient temperature because their 
energy barrier is overcome using light energy. 

Photoredox catalysis has undergone 
unprecedented development in the past 
decade, but some challenges remain. One 
is that no photoredox catalyst provides a 
reductant comparable in strength to that 
of alkali metals such as lithium and sodium. 
Alkalimetals are still used in various reactions 
as potent reductants, despite their associ- 
ated hazards and their tendency to produce 
undesired side products (that is, they have 
relatively low selectivity). 


One example of a photoredox reductive 
process is the generation of molecular species 
called aryl radicals, which, when organic com- 
pounds are being synthesized, can be used as 
asource of aryl groups (groups derived from 
a benzene ring or a benzene analogue by the 
removal of a hydrogen atom). Aryl halide com- 
pounds, in which an aryl group is attached to 
ahalogen atom (chlorine, bromine or iodine), 
are preferred starting materials for generating 


aryl radicals because they are widely available 
and easy to handle. Aryl chlorides are the 
most preferred, but they are the most diffi- 
cult aryl halides to reduce — as reflected by 
their highly negative reduction potentials. 
Reduction potentials quantify the tendency 
of a compound to acquire electrons from 
other compounds; for example, the reduction 
potential of chlorobenzene, asimple aryl chlo- 
ride, is -2.78 volts relative to the potential of 
a saturated calomel electrode (SCE), a stand- 
ard reference used in reduction-potential 
measurements’. 

It has not been possible to reduce aryl 
chlorides using a single PET process with 
visible light, because visible-light photons 
don’t have enough energy for the task. 
To reduce another compound, an excited 
photoredox catalyst must have an oxidation 
potential (a measure of its ability to lose 
electrons to other compounds) lower than 
the reduction potential of the compound 
to be reduced. 10-Phenylphenothiazine, for 
example, is one of the most strongly reduc- 
ing photoredox catalysts when excited by 
light, but the oxidation potential of excited 
phenothiazine is only -2.1V relative to 
SCE‘ (versus SCE) — insufficient to convert 
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Figure 1| An excited neutral radical acts as a potent reductant. The strength of chemical reductants is 
quantified by their oxidation potential, which is measured in volts relative to the potential of a reference 
electrode (such as a saturated calomel electrode; SCE). Two of the strongest reductants are the alkali 

metals sodium and lithium. Relatively strong reductants can also be produced from organic molecules 

in light-driven processes called photoinduced electron transfers (PETs), but the oxidation potentials are 
insufficiently negative for many reductions’. More-negative values can be achieved using two consecutive 
PET steps (see ref. 5, for example), or in electrophotoreduction processes that combine an electrochemical 
step with a PET step*”. The mesitylacridinium ion (Mes-Acr*) can be converted into a radical (Mes-Acr’) when 
irradiated by light of wavelength 450 nanometres in the presence of a sacrificial reductant. Mackenzie et al.’ 
report that when this radical is irradiated by light of wavelength 390 nm, it produces an excited radical, (Mes- 
Acr*)*, that is a potent reductant. Me, methyl group; ¢Bu, tertiary butyl group. 
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chlorobenzene into aryl radicals, for instance. 

To overcome this problem, various systems 
have been reported that involve the use of 
two consecutive PET steps (see ref. 5, for 
example). In these approaches, a ‘sacrificial’ 
reducing agent reduces the excited catalyst 
molecule produced in the first step, forming 
aradical anion that is then excited by another 
photon. The resulting excited radical anion 
is a strong reducing agent. For instance, the 
excited radical anion formed from the catalyst 
Rhodamine 6G has an oxidation potential of 
-2.4 V versus SCE, which is sufficiently nega- 
tive to reduce aryl bromides and aryl chlorides 
that have a reduction-facilitating group*®. 

MacKenzie et al. now report an approach 
based ona salt that contains a mesitylacridin- 
ium ion (Mes-Acr’; Fig. 1). Mesitylacridinium 
salts have been used for almost two decades in 
photo-oxidation reactions’ — whenirradiated 
by visible light, the resulting excited species is 
a potent oxidant that takes an electron from 
a substrate and is thereby converted into an 
acridine radical (Mes-Acr’). The electrically 
neutral radical is converted back to Mes-Acr* 
by an oxidant for subsequent catalytic cycles. 

The authors recognized that Mes-Acr’ is 
a relatively stable species that absorbs light 
mainly from two ranges of wavelengths: 
350-400 nanometres and 450-550 nm. They 
report that, when Mes-Acr is irradiated with 
light of wavelength 390 nm, it forms an excited 
neutral radical that acts as an extremely strong 
reducing agent, with a maximum oxidation 
potential of -3.36 V versus SCE. They propose 
that this large negative value is the result of 
charge transfer within the excited radical. 

The use of an excited neutral organic radical 
is rare in photoredox catalysis. MacKenzie and 
colleagues formulated a reductive photo- 
catalytic cycle based on Mes-Acr’ using 
390-nm light and a sacrificial reducing agent. 
This system can carry out several reduction 
reactions, suchas the removal of tosyl groups 
from tosylated amine compounds (a type of 
reaction commonly used in organic synthe- 
sis; see Fig. 3 of the paper’). The researchers 
demonstrated that the new system is robust 
enough to work on scales that are useful for 
preparing compounds in the laboratory, 
by performing a detosylation reaction with 
1.28 grams of a starting material. 

The same approach can also be used to 
replace bromine or chlorine atoms with hydro- 
gen atoms in aryl bromides and chlorides, 
respectively — such reactions are known as 
dehalogenations (see Fig. 2 of the paper’). This 
procedure is possible when various groups are 
present in the substrates, and it even works 
with 4-chloroanisole, anaryl chloride that has 
a reduction potential of -2.9 V versus SCE. 

Another approach for the catalytic pro- 
duction of strongly reducing species was 
reported simultaneously earlier this year in 
two papers from different groups®”. In both 
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cases, a neutral organic molecule acts as the 
catalyst; this is reduced electrochemically on 
acathode to produce a radical anion, whichis 
then excited by visible light to form a strong 
reductant with an oxidation potential more 
negative than -3.0 V versus SCE. These elec- 
trophotochemical systems were used to dehal- 
ogenate electron-rich aryl chlorides, and also 
inaseries of arylation reactions (transforma- 
tions in which an aryl group is attached to 
another molecule). 

The use of electrochemical reduction, 
instead of photochemical methods, to gen- 
erate radicals allows catalysts to be used 
that do not absorb visible light. For example, 
naphthalene monoimide, a catalyst used in 
one’ of the two papers, falls into this category 
and cannot undergo the initial conversion to 
aradical anion using visible light. By contrast, 
once itis transformed electrochemically into 
avisible-light-absorbing radical, itcanentera 
photocatalytic cycle. 

MacKenzie and colleagues’ observation 
of the strong reductant character of excited 
neutral Me-Acr’ will inspire investigations 
into whether other molecules show similar 
behaviour. One can also expect increased 
interest in other photocatalytic approaches 
for the production of reductive systems!” 8, 
Taking into account the highly negative 
oxidation potentials observed for various 
light-generated agents in the current work 


Developmental biology 


and by other research groups, we can look 
forward to new arylation reactions, and even 
to ambitious applications such as the Birch 
reduction" — aclassic synthetic reaction typ- 
ically performed using alkali metals. 


Radek Cibulka is in the Department of 
Organic Chemistry, University of Chemistry 
and Technology, Prague, Prague 16628, 
Czech Republic. 

e-mail: cibulkar@vscht.cz 
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Aclock that controls 
human spine development 


Adelaida Palla & Helen Blau 


Visualization of the rhythmic oscillations of the mouse and 
human segmentation clocks, which are crucial to spine 
development, is now possible thanks to the development of 
sophisticated cell-culture systems. See p.113, p.119 & p.124 


What do the flashes of a firefly and the 
chirpings of a cricket have in common? Both 
occur inaregular rhythm, whichis controlled 
by an oscillating biological clock’. Another 
oscillating genetic clock controls the develop- 
ment of embryonic structures called somites, 
which give rise to the vertebrae that protect 
the spinal cord. Our knowledge of this seg- 
mentation clock stems almost entirely from 
research on animals”?, because technical 
and ethical considerations limit the study of 
human embryos inculture. Diaz-Cuadros et al.* 
(page 113) and Matsuda et al.° (page 124) now 
report a breakthrough that enables studies 
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of the human segmentation clock in vitro. In 
addition, Yoshioka-Kobayashi et al.° (page 119) 
use sophisticated techniques in mice to pro- 
vide insights into the mechanisms that control 
the mammalian segmentation clock. 
Somites arise from a tissue called the 
presomitic mesoderm (PSM). During somite 
formation, temporally and spatially con- 
trolled oscillations in transcription yield 
gene-expression waves that propagate through 
the PSM along the embryo’s head-to-tail axis. 
The result is a striped pattern of somites that 
forms the blueprint for the spine. Although the 
molecular components of the segmentation 


clock are highly evolutionarily conserved 
across vertebrates, new somites form with dif- 
ferent rhythms in each species. For instance, 
gene oscillations have a period of 30 minutes 
in zebrafish and 2 hours in mice. Oscillations 
have been estimated to occur every 4to5 hours 
in humans? — although until now they have 
never been directly observed. 

Diaz-Cuadros et al. and Matsuda et al. set 
out to model the human clock using induced 
pluripotent stem cells (iPSCs) — cells that are 
generated in vitro from differentiated human 
cells and, similarly to embryonic stem cells, 
can give rise to every cell type in the body. 
The groups used established protocols’ ’ to 
convert iPSCs into PSM in vitro. 

To visualize and monitor the dynamic 
oscillations of clock genes in the cultured 
PSM in real time, each group used a different 
‘reporter’ protein. Matsuda and colleagues 
used a reporter in which a key segmenta- 
tion-clock gene”, Hes7, drives production of 
the bioluminescent enzyme luciferase. As Hes7 
expression oscillates, levels of the reporter 
increase and decrease. Diaz-Cuadros et al. 
used an engineered version of Hes7 fused to 
a gene that encodes Achilles, which is a more 
rapidly generated variant of yellow fluorescent 
protein developed by Yoshioka-Kobayashi 
and colleagues. The use of Achilles enabled 
Diaz-Cuadros and co-workers to track fluores- 
cent waves of Hes7 expression at the single-cell 
level* — aresolution not possible with the lucif- 
erase reporter. Analyses using both reporters 
provide the first definitive evidence that the 
human segmentation clock has a period of 
approximately 5 hours (Fig. 1a). 

Three key signalling pathways — the Notch, 
Wntand FGF pathways — act in sequential nega- 
tive feedback loops to regulate oscillating gene 
expression during somite formation??"”. 
Diaz-Cuadros and colleagues used their culture 
system to investigate these pathways in detail. 
They confirmed the roles of these pathways 
in PSM cells taken from mouse embryos, and 
then showed that similar pathways govern seg- 
mentation in human PSM differentiated from 
iPSCs, with oscillations dependent on Notch 
signalling and another pathway, mediated 
by a protein called YAP. They found that FGF 
signalling not only determines the positions 
along the body axis at which oscillations stop, 
as previously reported’, but also regulates the 
complex dynamics of the oscillations — their 
period, phase and amplitude. 

Matsuda and colleagues used their culture 
protocol to study a human genetic disease, 
congenital spondylocostal dysostosis, in 
which defects in segmentation of the ver- 
tebrae lead to skeletal anomalies”. The 
authors generated PSM from iPSCs derived 
from two people with the disease, who each 
had mutations ina different gene of the Notch 
signalling pathway. Surprisingly, despite these 
mutations and differences in overall gene 


a b c 
Hes7 
- = —s — Embryo 
Achilles or luciferase Luciferase Achilles 
/ \ \ Hes7 
@°. \ Hes 
. = B® ol wrt e—-mutant Isolate 
e © | human human PSM 
~ / ipsc iPSC 
ae” . 4 
J Differentiation | 
a a a 
4 \ V4 - \ \ 
® { .e%. \ / 
f @ xe) ® 
| Ape Bo PSM | a os OD é \ | S ) 
\ e C7) cell \ “ff es ] \ Somites } 
ye Sy ti » a y 
~&§ aS Se = _ 
lL Monitor oscillations lL 
Se 8) 
ORS 
OR ce 
So 
ORS 
Qa 
oe 
SE 
35 > > > 
z= 0 5 10 ie (e) ls, 10 15 (e) 5 10 15 
Time Time Time 


Figure 1 | Modelling embryonic segmentation in vitro. A tissue called the presomitic mesoderm (PSM) 
gives rise to somites — embryonic precursors of vertebrae. This process involves a ‘segmentation clock’ that 
drives rhythmic oscillations of gene expression, including that of the gene Hes7. Three groups have developed 
systems to analyse the clock in culture using live-cell imaging. a, Diaz-Cuadros et al.* and Matsuda et al. 
directed wild-type (WT) human induced pluripotent stem cells (iPSCs) to become PSM cells. The iPSCs had 
been engineered to express a version of Hes7 that drives expression (arrow) of genes encoding the fluorescent 
molecule Achilles* or the luminescent molecule luciferase’. Monitoring the oscillations of these genes in PSM 
cells revealed that the human segmentation clock has a period of about 5 hours. b, Matsuda et al. performed 
the same experiment using iPSCs in which Hes7 is mutated, as in the skeletal disorder spondylocostal 
dysostosis, and founda lack of oscillations. c, Yoshioka-Kobayashi et al.‘ isolated the PSM from mouse 
embryos carrying a Hes7-Achilles reporter, and monitored oscillations, which have a 2-hour period. 


expression, the authors observed normal 
oscillations in the PSM. By contrast, when 
the authors produced PSM from cells geneti- 
cally engineered to carry a Hes7 mutation that 
had previously been identified as a cause of 
spondylocostal dysostosis®, they observed a 
dramatic loss of oscillations (Fig. 1b). This work 
highlights the potential of using iPSC-derived 
PSM to determine the relative roles of various 
clock components in development. 

It is known that, although individual PSM 
cells show autonomous oscillations, Notch 
signalling between cell neighbours syn- 
chronizes these oscillations'”* to produce 
gene-expression waves at the population 
level. Yoshioka-Kobayashi et al. set out to 
examine this role for Notch signalling in 
detail. The authors engineered mice to carry 
a Hes7-Achilles reporter, and to lack a pro- 
tein called Lunatic fringe that modulates 
Notch signalling. They then isolated the entire 
PSM from embryos that lacked Lunatic fringe 
and from controls that did not, and made use 
of optogenetics, alight-triggered gene-expres- 
sion system, to visualize somite development 
in culture by tracking Hes7 oscillations over 
time (Fig. 1c). Although the autonomous oscil- 
lations of single PSM cells were unaffected by 
loss of Lunatic fringe, the researchers observed 
oscillation defects at the population level. 
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Notch signalling involves the release of the 
protein DLL1 from one cell and its binding by 
Notch receptors on another. This interaction 
triggers a downstream signalling cascade in 
the receiving cell that causes increases in the 
expression of various genes, including Hes1 
(ref. 17). This sender-receiver system can be 
modulated using a genetically engineered 
optogenetic variant of the Dill gene that 
is expressed in response to stimulation by 
light’’. The authors stimulated DIl1, and com- 
pared how long it took for neighbouring cells 
to exhibit Hes1 upregulation in mice lacking 
Lunatic fringe with the time it took in controls. 
The study revealed that Lunatic fringe controls 
population-level oscillations by regulating the 
timing and amplitude of the signal-sending 
and signal-receiving process in adjacent cells. 
This work underscores the intricate role of 
Notchcomponents inthe cell-cell interactions 
that control clock oscillations. 

Together, the current studies provide a 
remarkable demonstration that simple iPSC 
culture systems can be used for in-depth 
analysis of the oscillatory gene expression 
associated with somite segmentation at 
single-cell resolution. However, they also 
have limitations. For instance, Diaz-Cuadros 
et al. and Matsuda et al. did not observe final 
stages of somite development and vertebra 
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formation in their human culture systems. 
Nonetheless, their protocols will undoubt- 
edly help to advance our understanding of the 
molecular basis of normal segmentation and 
to reveal the genes that, when mutated, lead 
to the development of disorders of the spine. 

More broadly, gene-regulatory networks are 
highly conserved between mammals, regard- 
less of the animals’ size or whether they are 
bipedal or quadrupedal. This is in stark con- 
trast to the species-specific timing of gene 
oscillations, which is fundamental to body- 
plan development. What causes these crucial 
differences in timing remains an enigma — but 
one that can now begin to be unravelled. 


Adelaida Palla and Helen Blau are in the Baxter 
Laboratory for Stem Cell Biology, Department 
of Microbiology and Immunology, Institute for 
Stem Cell Biology and Regenerative Medicine, 
Stanford School of Medicine, Stanford, 
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e-mail: hblau@stanford.edu 
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Predators ontrack 
for ocean protection 


Ana M. M. Sequeira 


Satellite tracking of marine predators in the Southern Ocean 
has revealed key ecological areas under disproportionate 
pressure from human activities. These results show the value 
of tracking data for informing conservation efforts. See p.87 


Even the most remote marine ecosystems on 
Earth — suchas thoseat high latitudes, includ- 
ing in the Southern Ocean around Antarctica 
—canno longer be considered pristine’. The 
effects of humans on marine ecosystems now 
have a global footprint? *, and mitigation of 
associated threats requires knowledge of the 
areas of particular ecological and biological 
significance. Such areas sustain the healthy 
functioning of marine ecosystems and should 
therefore be protected. On page 87, Hindell 
et al.> report analyses of tracking data for 
marine species that reveal these key areas in 
the Southern Ocean. 

The waters of the Southern Ocean encircle 
the Earth through the Drake Passage, the ocean 
region between the tip of South America and 
Antarctica. Because of this passage, the South- 
ern Ocean hasa key role in global climate and 
ocean circulation®. This ocean is also home 
to a unique range of marine fauna, including 
many charismatic predators, suchas penguins 
(Fig. 1) and seals, as well as the precious 
Antarctic krill (Euphausia superba). These krill 
are at the base of the marine food web, and, 
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alongside species of toothfish (Dissostichus 
eleginoides and Dissostichus mawsoni), are 
the target of the largest fishing industries in 
the Southern Ocean’®. The fisheries compete 
with animals for food resources, and fish- 
ing activities along with the pressures from 


“Tracking data are 
increasingly being used to 
inform conservation policy 
around the world.” 


climate change are raising concerns about 
the possibility of ecosystem collapses there®”’. 

The Commission for the Conservation of 
Antarctic Marine Living Resources is the main 
management body for the Southern Ocean, 
and is tasked with ensuring the preservation 
of this ecosystem. To succeed, the commission 
needs to take precautionary steps, including 
the establishment of more and better-designed 
marine reserves as has been suggested®, and 
sites for these should be chosen on the basis of 
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knowledge of the whereabouts of ecologically 
significant marine areas’°. However, accu- 
rately defining these areas ina highly dynamic, 
changing environment is challenging. 

Monitoring predators at the top of amarine 
food web can help with this task. Such preda- 
tors migrate within and between ecosystems, 
and can be used as indicator species" — those 
able to provide information on the status of 
an ecosystem or habitat ifalterations occurin 
their movement patterns, behaviour or repro- 
ductive success. In particular, tracking top 
predators can assist with identifying the areas 
that they use most, which can be considered 
as regions of great ecosystem importance, not 
only for the predators but also for a wide range 
of other species". Indeed, tracking data are 
increasingly being used to inform conserva- 
tion policy around the world”, and have been 
used to quantify the extent of spatial over- 
laps between species and fishing activities 
globally’. 

Hindell et al. report analyses of tracking 
data from 4,060 individuals of 17 species of 
marine predators (seabirds and mammals), 
and suggest a way to use such data to predict 
key ecological regions inthe Southern Ocean. 
Tracking data were collected between 1991 and 
2016 using electronic tags attached to the 
animals. These tags provided location esti- 
mates (obtained using satellite information 
or other methods) as the animals migrated. 
The authors used some of these data (for 
2,823 individuals) to develop predictive 
models to identify crucial habitats in the 
Antarctic region for all of the predator species 
combined. These integrated results providea 
spatially defined assessment of areas of high 
biodiversity that includes species across 
multiple levels of the food chain (termed 
trophic levels) inthe Southern Ocean. 

Defining a single, integrated result from such 
varied data sets and from so many species is a 
complex undertaking. Predators in the South- 
ern Ocean include a large range of species from 
across different taxonomic groups. These 
include species living in the Antarctic region 
and species residing immediately north of it 
(in the sub-Antarctic), all with different diets 
and patterns of movement. The authors used 
aseries of data-processing steps to generatea 
value they termed ‘habitat importance’, which 
they predicted using data across all of these 
species together (assemblage-level maps). To 
do this, Hindell and colleagues first mapped 
habitat importance for the species living inthe 
Antarctic separately from those living in the 
sub-Antarctic, and then selected the maximum 
habitat-suitability values in those two maps to 
generate an overall assemblage-level map for 
all of the predator species combined. 

Finally, the authors defined the regions in 
the top 10% of their calculated habitat impor- 
tance value as the areas of the most ecological 
significance in the Southern Ocean. This final 
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step was a central part of their study. Itenabled 
comparisons to be made between the areas of 
ecological significance and the areas affected 
by human activities, as well as between the lev- 
els of existing protection inside and outside 
these areas. 

Hindell and co-workers report that the 
predicted areas of ecological significance they 
identified match the ocean regions of known 
elevated productivity for Antarctic krill’ and 
for other organisms at the base of the food 
web, including myctophids (lanternfish)™. 
This result is consistent with the idea that 
marine predators can be used as indicators 
to identify areas of ecological significance. 
The authors report the particularly striking 
finding that a disproportionately higher level 
of human pressures (fishing and the effects of 
climate change) occurred inside rather than 
outside the areas identified as being of eco- 
logical significance. On the basis of this, the 
authors recommend that the current network 
of protected marine areas in the Southern 
Ocean be extended. They confirm that these 
extensions should include the areas for which 
protection is already being planned. 

Itwouldhave been interesting if the authors 


ane 1| ene eignlite Cistensitees forsteri) in Redeenties: Hindell et al.’ report analyses of tracking data for marine pete. including this penguin 
species. The authors’ results pinpoint regions of the Southern Ocean around Antarctica that should be protected. 


had suggested how an approach similar to 
theirs could best be used to tackle comparable 
problems ona global scale. For example, the 
authors’ views on the best strategy for contrib- 
uting scientific knowledge to inform efforts to 
protect biodiversity on the highseas (the waters 
outside national jurisdictions) would have been 
a valuable addition. This topical issue is cur- 
rently being discussed by the United Nations 
General Assembly, and negotiations are under 
way to develop an international legally binding 
solution to address the problem”. 

Scientists have tracked marine predators for 
decades**”. Itis timeto poolall these existing 
data sets to address pressing conservation 
challenges on a global scale. To succeed, a 
worldwide movement is needed within the 
community of animal-tracking researchers, to 
drive the sharing of these data and to combine 
them with information about human activi- 
ties at sea. Combining such information will 
deliver much-needed evidence of the extent 
of existing threats, to inform managers and 
policymakers in a timely manner. As Hindell 
and colleagues state, the Southern Ocean has 
the potential to provide an example of how 
“science, policy and managementcan interact 


© 2020 Springer Nature Limited. All rights reserved. 


to meet the challenges of a changing planet”, 
and their work highlights a pathway for how 
best to direct policy efforts. 


Ana M. M. Sequeira is at the Oceans Institute 
and School of Biological Sciences, University 
of Western Australia, Perth, Western Australia 
6009, Australia. 

e-mail: ana.sequeira@uwa.edu.au 


Blight, L. K. & Ainley, D. G. Science 321, 1443 (2008). 

Halpern, B. S. et al. Science 319, 948-952 (2008). 

Queiroz, N. et al. Nature 572, 461-466 (2019). 

Sequeira, A. M. M. et al. Front. Mar. Sci. 6, 639 (2019). 

Hindell, M. A. et al. Nature 580, 87-92 (2020). 

Rintoul, S. R. Nature 558, 209-218 (2018). 

Nicol, S., Foster, J. & Kawaguchi, S. Fish Fish. 13, 30-40 

(2012). 

8. Brooks, C. M. et al. Nature 558, 177-180 (2018). 

g. Atkinson, A. et al. Nature Clim. Change 9, 142-147 (2019). 

10. Dunstan, P. K. et al. Ocean Coastal Mgmt 121, 116-127 
(2016). 

11. Sergio, F. et al. Annu. Rev. Ecol. Evol. Systemat. 39, 1-19 
(2008). 

12. Hays, G. C. et al. Trends Ecol. Evol. 34, 459-473 (2019). 

13. Atkinson, A. et al. Mar. Ecol. Prog. Ser. 362, 1-23 (2008). 

14. Freer, J. J., Tarling, G. A., Collins, M. A., Partridge, J.C. & 
Genner, M. J. Diversity Distrib. 25, 1259-1272 (2019). 

15. Wright, G. et al. Mar. Policy https://doi.org/10.1016/j. 

marpol.2018.12.003 (2019). 


NOTRwONe 


This article was published online on 18 March 2020. 


Nature | Vol580 | 2April2020 | 35 


News & views 


Cell biology 


Ghostly metabolic 
messages from dying cells 


Douglas R. Green 


Cell death by a process called apoptosis inhibits inflammation 
in surrounding tissue. The finding that dying apoptotic cells 
release a tailored cocktail of metabolite molecules reveals a 
way in which they influence their living neighbours. See p.130 


“Marley was dead, to begin with. There is no 
doubt whatever about that.” The iconic open- 
ing lines of Charles Dickens’s novel A Christmas 
Carolconvey the idea of the finality of death, a 
concept that pervades our thinking even when 
considering the demise of cells. A dead cellis, 
to echo Dickens’s description of Marley, “as 
dead as a doornail”. But just as Marley had an 
influence from beyond the grave to change the 
character of Ebenezer Scrooge in the story, 
cells that die can have a vital effect on the 
living cells around them. Medina et al.’ bring 
this process to life on page 130 by uncovering 
metabolic processes in dying cells that have 
important consequences for the organism. 

Every second, millions of cells die in our 
bodies owing to processes that are a normal 
part of life, such as tissue turnover and 
responses to environmental stresses”. The vast 
majority of these deaths occur by a process 
called apoptosis. This is a form of cellular 
suicide that is orchestrated by the actions 
of enzymes called caspases, which cleave 
hundreds of different intracellular proteins’. 
This regulated cleavage of various caspase 
targets effectively ‘packages’ the dying cell 
through an orderly dismantling process. 
DNA in the nucleus is cut into small pieces, 
the cytoplasmic ‘skeleton’ of filamentous 
actin protein is remodelled to break the cell 
into smaller fragments, and the exposure of 
a particular lipid on the cell surface signals to 
immune cells, such as macrophages, to take 
up (engulf) and digest the dying cell’. 

Ever since the original description of 
apoptosis’, it has been known that this form 
of cell death does not trigger an inflammatory 
response, as occurs in other types of cell death, 
such as necrosis. Subsequent research’ con- 
firmed that apoptotic cell death is anti-inflam- 
matory, leading to proposals that the injection 
of apoptotic cells might be used to control 
inflammatory disease. The inflammation 
caused by necrotic cell death has been attrib- 
uted to the release of molecules called dam- 
age-associated molecular patterns (DAMPs), 
of which several have been identified®. By 
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contrast, little is known about the mechanism 

underlying the anti-inflammatory proper- 
ties of apoptotic cells. The engulfment of 
apoptotic cells by macrophages promotes 
tissue repair’, and the apoptosis-associated 
molecules responsible for this effect are 
unknown. 

Medina and colleagues discovered that, 
during the apoptosis of mammalian cells 
(including human cells) grown in vitro, small 
molecules released from the dying cells 
can induce macrophages to express genes 
involved in tissue repair and the inhibition 
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of inflammation. The authors speculated 
that metabolites — molecules arising from 
metabolic processes — were responsible for 
this effect. By profiling different cell types 
undergoing apoptosis in response to different 
triggers, Medina et al. identified metabo- 
lites that were consistently released from 
all dying cells, whereas other metabolites in 
the cells were not released. This specificity 
was due, at least in part, to the selectivity of 
a particular protein channel on the cell sur- 
face, pannexin 1 (PANX1), which opens when 
it is cleaved by caspases®. Apoptotic cells 
engineered to lack PANX1 did not release the 
apoptosis-associated metabolites. 

The authors examined six metabolites 
released from all apoptotic cells and found that 
none, individually, had a significant effect on 
the gene-expression profile of macrophages. 
However, administration ofall six hada robust 
effect on the gene-expression pattern, anda 
similar expression profile could be induced, 
at least partially, by exposing macrophages 
toa mixture of just three of the metabolites: 
spermidine, guanosine monophosphate and 
inosine monophosphate. The authors report 
that administering a mixture of these three 
metabolites had remarkable anti-inflamma- 
tory effects in vivo — inhibiting disease in a 
mouse model of arthritis and limiting the 
rejection of lung transplants in mice. 
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Figure 1 | Cells that die by a process called apoptosis signal to neighbouring cells. Medina et al.' report 
that dying human apoptotic cells release molecules produced by metabolic processes, and that these 
metabolites have anti-inflammatory and tissue-repair effects. a, In healthy human cells, the amino-acid 
arginine is often converted to the molecule ornithine, which is either used in a pathway that generates 
the molecule spermidine or transported into a mitochondrion (a type of organelle), where it is converted 
to citrulline and other metabolites. Until the cell starts to die, a channel protein on the cell surface called 
pannexin 1 (PANX1) remains closed. b, As the cell undergoes apoptosis, the core apoptotic machinery 
activates enzymes termed caspases, which cleave PANX1, and the channel then opens. Production of the 
molecules spermidine and putrescine becomes higher than normal. One possible way to explain this is if 
the core apoptotic machinery prevents ornithine from entering the mitochondrion and instead diverts it 
towards spermidine production. Spermidine and other specific metabolites (not shown) are selectively 


released through PANX1 and influence adjacent cells. 
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Spermidine is a type of molecule called a 
polyamine. It is mainly produced froma meta- 
bolic pathway that converts the amino acid 
arginine to polyamines through intermediates 
that include the molecule ornithine (Fig. 1). 
Medinaand colleagues traced the conversion 
of arginine to spermidine by this pathway, and 
found that cells induced to undergo apoptosis 
increased their synthesis of spermidine and its 
precursor, the molecule putrescine, before 
dying. The apoptotic cells released spermi- 
dine, but not putrescine. Spermidine release 
occurred in a PANX1-dependent manner. 

Although this phenomenon was monitored 
using just one apoptosis-inducing condition 
(namely, ultraviolet radiation), the finding 
raises the possibility that activation of apop- 
tosis drives this pathway, which synthesizes 
spermidine. The hint that suggests this is the 
authors’ observation of the effects of admin- 
istering a type of drug called a BH3 mimetic. 
This drug directly triggers a core step in apop- 
tosis, the permeabilization of mitochondrial 
organelles in an event called mitochondrial 
outer membrane permeabilization (MOMP) — 
and its use led to spermidine release at levels 
comparable to those observed in apoptosis 
mediated by ultraviolet radiation. Perhaps 
MOMP prevents thetransport of ornithine into 
mitochondria (where ornithine is converted 
to the molecule citrulline), and leads instead 
to ornithine being mobilized in cytoplasmic 
pathways leading to spermidine production. 
This model could be tested in cells engineered 
to lack components required for MOMP and 
exposed to BH3 mimetics. 

The molecule urea is formed as a by-product 
of the conversion of arginine to ornithine. 
Urea is aninflammatory DAMP thatis released 
from necrotic cells®, but the authors did not 
determine whether urea is released through 
PANX1 during apoptosis. However, because 
Medina and colleagues observed arise in argi- 
nine metabolism during apoptosis, if urea is 
not released through PANXI, this might pro- 
vide a further reason why apoptosis is not 
inflammatory. 

How do spermidine, guanosine mono- 
phosphate and inosine monophosphate 
induce responses in macrophages, and why 
dothe three metabolites work only when given 
together? Guanosine monophosphate and 
inosine monophosphate are known to signal to 
G-protein-coupled adenosine receptors’, and 
spermidine can participate in a broad range of 
activities. The molecule inosine (which canbe 
derived from inosine monophosphate) has 
anti-inflammatory effects’ and can prevent 
lethal inflammation in response to a bacterial 
toxin in mice”. It is possible that spermidine 
acts to increase such anti-inflammatory 
signalling from the adenosine receptors. 
Human cells are ten times less sensitive than 
mouse cells to the anti-inflammatory effects 
of inosine, probably owing to differences in 


adenosine-receptor expression and function 
between the species®, and therefore efforts to 
use these metabolites to treat human disease 
might prove challenging. 

Medina and colleagues’ work opens rich 
possibilities for future investigations into 
how apoptosis triggers metabolic changes, 
and how the regulated release of metabolites 
influences tissues. In contrast to apoptosis, 
other forms of cell death, such as regulated 
forms of necrosis, have profoundly different 
effects on surrounding cells, and whether 
and how changes in metabolism triggered 
by those cell-death pathways influence their 
surroundings is unknown. Cells that die by a 
form of regulated necrosis termed necroptosis 
continue to synthesize and secrete molecules 
called cytokines that affect inflammation”. In 
these dead ‘zombie’ cells, this synthesis occurs 
inan organelle called the endoplasmic reticu- 
lum", raising the possibility that metabolites 
produced in the functioning endoplasmic 
reticulum of these zombie cells also signal to 
living cells in the surrounding tissue. Marley’s 
ghost appears in chains that he said were 


Nuclear physics 


forged in life; what other chains are forged in 
cell death? 
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Abroken nuclear mirror 


Bertram Blank 


The principle of mirror symmetry, which states that nuclear 
structure remains the same when protons are swapped for 
neutrons and vice versa, has been found to be broken in the 
lowest-energy forms of a mirror pair of nuclei. See p.52 


Nature likes symmetry. Examples range across 
size scales from macroscopic objects, such 
as spiderwebs or honeycombs, to the micro- 
scopic world with its arrangement of atomsin 
molecules, or of electrons around an atomic 
nucleus. Symmetry also exists at the level of 
nuclei, but on page 52, Hoff et al! report one 
way of breaking it. 

Atomic nucleiare composed of two different 
types of particle — protons and neutrons — 
which, if we ignore the charge on the proton, 
resemble each other so much that they are 
often treated as a single particle, the nucleon. 
Mirror pairs of nuclei, inwhichthe numbers of 
neutrons and protons have been exchanged, 
therefore have similar properties. 

In particular, the sequence of energies of 
a mirror pair’s nuclear states should be the 
same, from the ground state in which the 
nucleons are in the lowest possible energy 
level, to excited states of increasing energy’. 
A change in this sequence has, however, pre- 
viously been observed for excited states of 
mirror partners*. Hoff and co-workers now 


© 2020 Springer Nature Limited. All rights reserved. 


report the breaking of mirror symmetry at the 
level of bound nuclear ground states (Fig. 1). 
They report that the ground states of the 
mirror partners bromine-73 and strontium-73 
are not simply ‘mirror images’ in which pro- 
tons and neutrons have been swapped, but 
havea different configuration of protons and 
neutrons. 

How does this difference arise? The most 
basic building blocks of matter known today 
are quarks, of which there are six types. Pro- 
tons and neutrons are both constructed 
from three quarks, and the most important 
difference between them is that their different 
quark combinations give the proton an elec- 
tric charge of +1, whereas the neutron ends 
up neutral. 

The strong nuclear interaction that binds 
nucleons together in an atomic nucleus is 
essentially the same between protons and 
neutrons. For protons, however, the elec- 
tric repulsion between identically charged 
particles adds together. When building two 
mirror-symmetric atomic nuclei, one with 
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Figure 1| Breaking nuclear mirror symmetry. a, Ina pair of mirror nuclei, the number of protons in one 
nucleus equals the number of neutrons in the other, and vice versa. For perfect mirror symmetry, the nuclear 
structure and energy levels of the ground and excited states (shown schematically; dashed lines connect 
equivalent states) are essentially the same on swapping protons for neutrons, apart from a small overall shift 
caused by proton repulsion in the proton-rich nucleus. b, Hoff etal.’ report that the lowest-energy states of 

a mirror pair can have a different configuration of protons and neutrons; red dashed lines indicate that the 
lowest energy levels in one nucleus have swapped places compared witha. The cartoon illustrates asimple 
example of mirror symmetry and how it might be broken. 


Z protons and Nneutrons and the other with 
Nprotons and Zneutrons, this repulsion adds 
an extra global energy (mass) to the nucleus 
that has the more protons, but does not 
modify the arrangement of protons and neu- 
trons. This symmetry explains why several of 
the properties of mirror partners are nearly 
identical: in their shape; their behaviour when 
excited (that is, when energy is added); and 
the properties of the decay processes through 
which unstable nuclei lose energy by emitting 
particles or radiation. 

To determine nuclear properties such as 
energy levels, energy is pumped into anucleus 
(for instance, by colliding it with another 
nucleus), and the decay process in which 
y-rays are emitted from the resulting excited 
nucleus is observed. The previously observed 
difference’ in the sequence of energy levels for 
the excited states of mirror partners occurred 
particularly at higher excitation energies, in 
which the density of states increases (that is, 
the neighbouring states come closer to each 
other). This difference of energy levels is asign 
that mirror symmetry is only approximate and 
can be broken in particular circumstances. 

A different structure in nuclear ground 
states has been observed* previously for 
only one pair of mirror nuclei, nitrogen-16 and 
fluorine-16. In that case, however, one of the 
two partners (fluorine-16) is unbound — that 
is, the repulsion between protons outweighs 
the attraction from the strong nuclear force. It 
therefore decays rapidly by ejecting a proton 
in around 10 ~ seconds (ref. 5), comparable 
to the time it takes a nucleon to travel across 
the nucleus. However, nitrogen-16 is much 
more stable, with a half-life of about 7 sec- 
onds (ref. 6). So the mirror difference there 
can be explained by the unbound nature of 
one partner. 

Hoff et al. reveal that the situation is 
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different for bromine-73 and strontium-73, 
because both are long-lived and quasi-stable. 
To break mirror symmetry, nature had to play 
a trick: the ground states of these two nuclei 
are very close in energy to their respective first 
excited states. Mirror symmetry, being only 
an approximate symmetry, can therefore be 
violated by exchanging the ground and the 
first excited states in one of the two nuclei. 

The properties of bromine-73 have been well 
characterized for 50 years’, whereas informa- 
tion about strontium-73 is limited: we have a 
rough value for its half-life’, and know its 
strongest mode of decay”. The originality of 
Hoff and co-workers’ study is that the authors 
did not study the properties of strontium-73 
directly, but throughits two consecutive radio- 
active decays: the first decay occurs through 
the emission of B-particles and produces 
a particular state in the daughter nucleus, 
rubidium-73, which immediately decays by 
proton emission to produce krypton-72. The 
observed properties of the proton emission 
allowed the authors to deduce the structure of 
the proton-emitting state in rubidium-73, and, 
from this, the structure of the ground state of 
strontium-73. 

The results allowed a nuclear property 
known as spin to be characterized, and 
revealed something unexpected. The ground 
state of strontium-73 turns out not to havea 
spin of 1/2, as the ground state of bromine-73 
does, but instead has a spin of 5/2, which cor- 
responds tothe first excited state of its mirror 
partner. Thus, mirror symmetry has now been 
shown to be broken in bound nuclear ground 
states. 

Is this breaking of mirror symmetry a 
disaster for our understanding of the structure 
of the atomic nucleus? Not at all. Deviations 
from expectations challenge our knowledge of 
nuclear structure, and allow nuclear scientists 
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to fine-tune their models to describe atomic 
nuclei. As Hoff et al. show, the observed mir- 
ror-symmetry breaking might be triggered 
by the existence of two competing nuclear 
shapes, a prolate (rugby-ball) shape and an 
oblate (disk) shape. Both structures give the 
nuclei approximately the same energy and 
mass. These two shapes can mix, and the 
symmetry breaking in bromine-73 and stron- 
tium-73 might arise because there is a different 
degree of mixing in the two nuclei. 

It will be interesting to see whether other 
cases of ground-state mirror-symmetry break- 
ing can be found. No other candidates seem 
to exist for nuclei that have similar numbers 
of nucleons to bromine-73 and strontium-73, 
because no nucleus is known for which the 
first excited state lies very close to the ground 
state. However, heavier nuclei are promising 
candidates. With more nucleons, more nuclear 
energy levels canbe built, and the energy levels 
come closer together. By contrast, no mirror 
partners exist for nuclei whose mass num- 
ber (the sum of the proton number and the 
neutron number) is greater than about 100 
(ref. 10), because the nuclear interaction can 
no longer overcome the electrical repulsion 
associated with interactions between the pro- 
tons in the ‘proton-rich’ mirror partner. The 
race is onto find more cases of broken mirror 
symmetry in nuclear ground states. 
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Sustainable Development Goal 14 of the United Nations aims to “conserve and 


sustainably use the oceans, seas and marine resources for sustainable development”. 
Achieving this goal will require rebuilding the marine life-support systems that deliver 
the many benefits that society receives froma healthy ocean. Here we document the 
recovery of marine populations, habitats and ecosystems following past conservation 
interventions. Recovery rates across studies suggest that substantial recovery of the 
abundance, structure and function of marine life could be achieved by 2050, if major 
pressures—including climate change—are mitigated. Rebuilding marine life 
represents a doable Grand Challenge for humanity, an ethical obligation and a smart 
economic objective to achieve a sustainable future. 


The ability of the ocean to support human wellbeing is at a crossroads. 
The ocean currently contributes 2.5% of global gross domestic prod- 
uct (GDP) and provides employment to 1.5% of the global workforce, 
with an estimated output of US$1.5 trillion in 2010, which is expected 
to double by 2030’. Furthermore, there is increased attention on the 
ocean as a source of food and water’, clean energy! and as a means to 
mitigate climate change**. However, many marine species, habitats and 
ecosystems have suffered catastrophic declines> ®, and climate change 
is Further undermining ocean productivity and biodiversity’ “ (Fig. 1). 

The conflict between the growing dependence of humans on ocean 
resources and the decline in marine life under human pressures (Fig. 1) is 
focusing the attention onthe connection between ocean conservation 
and human wellbeing’. The United Nations Sustainable Development 
Goal 14 (UN SDG 14 or ‘life below water’) aims to “conserve and sustain- 
ably use the oceans, seas and marine resources for sustainable develop- 
ment” (https://sustainabledevelopment.un.org/sdg14). Achieving this 
goal will require rebuilding marine life, defined inthe context of SDG 14 
as the life-support systems (populations, habitats and ecosystems) that 
deliver the many benefits that society receives froma healthy ocean’*”. 
Here we show that, in addition to being a necessary goal, substantially 
rebuilding marine life within a human generation is largely achievable, 
if the required actions—including, notably, the mitigation of climate 
change—are deployed at scale. 


Reversing the decline of marine life 

By the time the general public admired life below water through the 
television series ‘The Undersea World of Jacques Cousteau’ (1968-1976), 
the abundance of large marine animals was already greatly reduced*”"*. 
Since the first frameworks to conserve and sustain marine life were 


introduced inthe 1980s, the abundance of marine animals and habitats 
that provide essential ecosystems services has shrunk even further**?”° 
(Fig. 1). Currently, at least one-third of fish stocks are overfished”, one- 
third to half of vulnerable marine habitats have been lost’, a substantial 
fraction of the coastal ocean suffers from pollution, eutrophication, 
oxygen depletion and is stressed by ocean warming”, and many 
marine species are threatened with extinction’. Nevertheless, biodi- 
versity losses in the ocean are less pronounced than on land’ and many 
marine species are capable of recovery once pressures are reduced or 
removed (Figs. 2,3). Substantial areas of wilderness remain in remote 
regions” and large populations of marine animals are still found, for 
example, in mesopelagic (200-1,000 m depth) ocean waters””. 

Regional examples of impressive resilience include the rebound of 
fish stocks during World War I and World War II following a marked 
reduction in fishing pressure”’, the recovery since 1958 of coral reefs 
in the Marshall Islands from 76 megatons of nuclear tests”? and the 
improved health of the Black Sea®’ and Adriatic Sea* following a sudden 
reduction inthe application of fertilizers after the collapse of the Soviet 
Union. Although these rapid recoveries were unrelated to conservation 
actions, they helped to inform subsequent interventions that have been 
deployed in response to widespread ocean degradation’””’. These 
interventions include a suite of initiatives to save threatened species, 
protect and restore vulnerable habitats, constrain fishing, reduce pol- 
lution and mitigate climate change (Fig. land Table 1). 


Impactful interventions 

The regulation of hunting. The protection of species through the 
Convention on International Trade of Endangered Species (CITES, 
1975, https://cites.org/) and the global Moratorium on Commercial 
Whaling (1982, https://iwc.int/home) are prominent examples of inter- 


'Red Sea Research Center (RSRC), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia. ?Arctic Research Centre, Department of Biology, Aarhus University, Aarhus, 
Denmark. “Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia. ‘Department of Economics, Colorado State 
University, Fort Collins, CO, USA. ‘Department of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA. ‘Departamento de Ecologia, 
Facultad de Ciencias Bioldgicas and Centro Interdisciplinario de Cambio Global, Pontificia Universidad Catdlica de Chile, Santiago, Chile. Laboratoire d’‘Océanographie de Villefranche, 
Sorbonne Université, CNRS, Villefranche-sur-Mer, France. “Institute for Sustainable Development and International Relations, Sciences Po, Paris, France. °*Monegasque Association on Ocean 
Acidification, Prince Albert Il of Monaco Foundation, Monaco, Monaco. ‘Department of Earth & Environment, Boston University, Boston, MA, USA. "Department of Biology, Boston University, 
Boston, MA, USA. "Australian Research Council Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, Queensland, Australia. “National Museum of Natural History, 
Smithsonian Institution, Washington, DC, USA. “School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia. Department of Biology, Dalhousie University, 
Halifax, Nova Scotia, Canada. "Alfred Wegener Institute, Integrative Ecophysiology, Bremerhaven, Germany. "Department of Environment and Geography, University of York, York, UK. 


™e-mail: carlos.duarte@kaust.edu.sa 


Nature | Vol580 | 2 April 2020 | 39 


Review 


Opportunity to rebuild marine life 


Efforts to slow down pressures 


Sharp increase in pressures on and decline in marine life 


Rebuilding Debate on whether industrialized International Convention CITES (1975) UN Conference on 
marine life fishing could lead to permanent for the Prevention of IWC Whaling Environment and 
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(International Fisheries Exhibition, (MARPOL, 1973) (1982) UNCBD (1993) 
London, 1883) Geneva Convention on UNCLOS (1982) UN SDGs (2015) and 
Electric and gas the Law of the Sea (1958) Paris Agreement of 
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Fig. 1| Global pressures on marine life. Many human pressures commenced 
well before the industrial revolution; anumber of those pressures peaked inthe 
1980s and are slowing downat present (with great regional variation), with the 
notable exceptions of pollution and climate change. Initially, hunting and 
fishing were followed by deforestation, leading to excess sediment export and 
the direct destruction of coastal habitats. Pollution (synthetic fertilizers, 
plastic and industrial chemicals) and climate change represent more-recent 
threats. Hunting of megafauna has been heavily regulated or banned and 


national actions to protect marine life* (Fig. 1). These actions have been 
supplemented by national initiatives to reduce hunting pressure on 
endangered species and protect their breeding habitat**®. 


Management of fisheries. Successful rebuilding of depleted fish 
populations has been achieved at local and regional scales through 
well-proven management actions, including catch and effort restric- 
tions, closed areas, regulation of fishing capacity and gear, catch shares 
and co-management arrangements***° (Supplementary Information1). 
These interventions require detailed consideration of socio-economic 
circumstances, with solutions being tailored to the local context”. 
Persistent challenges include harmful subsidies, poverty and lack of 
alternative employment, illegal, unregulated and unreported fishing, 
and the disruptive ecological impacts of many fisheries***’. 


Water-quality improvement. Policies to lower inputs of nutrients 
and sewage to reduce coastal eutrophication and hypoxia were initi- 
ated four decades ago in the United States and European Union (EU), 
leading to major improvements today*? ”. Many hazardous pollutants 
have been regulated or phased out through the Stockholm Convention 
(http://www.pops.int/) and, specifically in the ocean, by the MARPOL 
Convention (http://www.imo.org/), often reinforced by national and 
regional policies. Recent attention has focused on reducing and pre- 
venting plastic pollution from entering the ocean, which remains a 
growing problem; inputs of plastic are currently estimated at between 
4.8 to 12.7 million metric tons per year*’. 


Habitat protection and restoration. The need to better protect sen- 
sitive habitats, including non-target species, has inspired the use of 
Marine Protected Areas (MPAs) as a comprehensive management 
tool?*>***, In 2000, only 3.2 million km? (0.9%) of the ocean was pro- 
tected, but MPAs now cover 26.9 million km? (7.4% of ocean area, or 
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fishing is now progressing towards more-sustainable harvests in many regions, 
and regulatory frameworks are reducing some forms of pollution. Climate 
change—caused by greenhouse gas emissions that have accumulated since the 
onset of the industrial revolution—became considerable compared with 
background variability in the 1960s, and is escalating as greenhouse gases 
continue to accumulate. As anet result of these cumulative human pressures, 
marine biodiversity experienced a major decline by the end of the twentieth 
century. 


5.3% if only considering fully implemented MPAs (http://mpatlas.org/, 
accessed 6 March 2020). MPA coverage continues to grow at about 8% 
per year” (Fig. 2and Supplementary Video 1). 

The twenty-first century has also seen a global surge of active habitat 
protection and restoration initiatives (Fig. 2, Supplementary Informa- 
tion 1and Supplementary Videos 1, 2), even in challenging environ- 
ments adjoining coastal megacities (Supplementary Information 1). 
These efforts have delivered benefits, such as improved water quality 
following oyster reef restoration. Additionally, Blue Carbon strategies, 
submitted within the nationally determined contributions (NDCs) of 
more than 50 nations—at the heart of the Paris Agreement—are being 
used to mitigate climate change and improve coastal protection by 
restoring seagrass, saltmarsh and mangrove habitats* *” (Supplemen- 
tary Information 1). 


Recovery to date 

Reductions to the risk of extinction. The proportion of marine species 
assessed by the IUCN (International Union for Conservation of Nature) 
Red List as threatened with global extinction (Supplementary Informa- 
tion 2) has decreased from 18.0% in 2000 to 11.4% in 2019 (s.d. =1.7%, 
n=1,743), with trends being relatively uniform across ocean basins 
and guilds (Supplementary Fig. 2.1). In part, this reflects the growing 
number of species that have been assessed. However, many assessed 
species have improved their threat status over the past decade*®*!, 
For marine mammals, 47% of 124 well-assessed populations** showed 
asignificant increase over the past decades, with 40% unchanged and 
only 13% decreasing (Fig. 3b and Supplementary Table 2). Some large 
marine species have exhibited particularly notable rebounds, even from 
the brink of extinction (Fig. 3c). Humpback whales migrating from Ant- 
arctica to eastern Australia have been increasing at 10% to 13% per year, 
froma few hundred animals in 1968 to more than 40,000 currently®. 
Northern elephant seals recovered from about 20 breeding individuals 
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Fig. 2| Global growth of restoration interventions. Distribution and growth 
of MPAs (a) and ecosystem restoration projects for coral and oyster reefs (b), 
saltmarshes and mangroves (c), and kelps and seagrasses (d); and the growth of 
MPAsas per cent of the total ocean area (e) and reported restoration projects 
(f) over time. NA, date not available. Numbers within symbols represent 


aggregated restoration projects for which the location was not provided 

(see Supplementary Information 1 for detailed examples, Supplementary 
Information 2 for datasources and Supplementary Videos 1, 2 for the animation 
of growth over time). 
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Fig. 3| Recovery trends of marine populations. a, Current population trends 
in scientifically assessed fish stocks based on the ratio of the annual biomass 
Brelative to the biomass that produces the maximum sustainable yield (Bysy). 
b, Percentage of assessed marine mammal populations that showed increasing 
or decreasing population trends or showed no change. c, Sample trajectories of 
recovering species and habitats from different parts of the world. Units were 
adjusted toacommonscale by multiplying or dividing as indicated inthe 


in1880 to more than 200,000 today”, and grey seal populations have 
increased by 1,410% in eastern Canada™ and 823% in the Baltic Sea*! 
since 1977. Southern sea otters have grown from about 50 individuals 
in 1911 to several thousand at present®. While still endangered, most 
sea turtle populations for which trends are available are increasing in 
size, with increases in green turtle nesting populations ranging from 
4to14% per year”. 


Recovery of fish stocks. Using a comprehensive stock-assessment 
database™, we find that fish stocks with available scientific assessments 
are increasingly managed for sustainability. The proportion of stocks 
with fishing mortality estimates (F) below the level that would produce 
amaximum sustainable yield (F< Fysy) has increased from 60% in 2000 
to 68% in 2012. Many fish stocks that are subject to such management 
interventions display positive trends (Fig. 3a), and globally aggregated 
stock assessments suggest a slowing down of the depletion of fish 
stocks”**?, although this trend cannot be verified for the majority of 
stocks, which lack scientific assessments”®. The most recent report of 
the Food and Agriculture Organization on global fisheries” also sug- 
gests that two thirds of large-scale commercial fish stocks are exploited 
at sustainable rates—although, again, this figure does also not account 
for smaller stocks or non-target bycatch species, which are often not 
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legend (nx), numbers at the end of the legends indicate the initial count at the 
beginning of time series. d, Range of recovery times for marine populations 
and habitats, and mean + 95% confidence limits recovery times for marine 
ecosystems. Lines indicate the reported range; where extending to 60 years, 
the maximum recovery time is 60 years or longer. See Supplementary 
Information 2 for details on datasources and methods, and Supplementary 
Table 3 for datasources for datashownind. 


assessed and in poor condition®™. Available data suggests that scien- 
tifically assessed stocks generally have a better likelihood of recovery 
owing toimproved management and regulatory status compared with 
unassessed stocks®*, which still represent the majority of exploited fish 
stocks, especially in developing countries. 


Reduction in pollution. Time-series analyses show that legacy persis- 
tent organic pollutants have declined even in marine environments 
that tend to accumulate them (for example, the Arctic»). The transition 
towards unleaded gasoline since the 1980s has reduced lead concen- 
trations to concentrations comparable to baseline levels across the 
global ocean by 2010-2011°*. Similarly, the total ban in 2008 of the anti- 
fouling chemical tributyltin (TBT) has led to rapid declines of imposex 
(females that develop male sexual organs)—a TBT-specific symptom—in 
an indicator gastropod”. Improved safety regulations have also led to 
a14-fold reduction in large oil spills from oil tankers from 24.7 events 
per year in the 1970s to 1.7 events per year in 2010-2019°. Whereas 
evidence of improved coastal water quality following nutrient reduc- 
tions was equivocal a decade ago”, multiple success stories have now 
been confirmed”, with positive ecosystem effects such as the net 
recovery of seagrass meadows in the United States” (Fig. 1), Europe”, 
the Baltic Sea*' and Japan®. 


Habitat restoration. Evidence that mangrove restoration can be 
achieved at scale first came from the Mekong Delta mangrove forest, 
possibly the largest (1,500 km?) habitat restoration undertaken to 
date®*, Global loss of mangrove forests has since slowed to 0.11% per 
year’, with stable mangrove populations along the Pacific coast of 
Colombia, Costa Rica and Panama®, and increasing populations in 
the Red Sea®’, Arabian Gulf”? and China”. Large-scale restoration of 
saltmarshes and oyster reefs has occurred in Europe and the United 
States (Fig. 2and Supplementary Information 1). Restoration attempts 
of seagrass, seaweed and coral reef ecosystems are also increasing 
globally, although they are often small in scale (Fig. 2, Supplementary 
Video 2 and Supplementary Information 1). Notably, a global inventory 
of total restored area is missing. 


Potential for rebuilding 


Efforts to rebuild marine life cannot aim to return the ocean to any 
particular past reference point. Our records of marine life are too frag- 
mented to compose a robust baseline, and the ocean has changed con- 
siderably and—in some cases-— irreversibly, including the extinction of 
at least 20 marine species”. We argue instead that the focus should be 
onincreasing the abundance of key habitats and keystone species, and 
restoring the three-dimensional complexity of benthic ecosystems. 
The yardstick of success should be the restoration of marine ecological 
structure, functions, resilience and ecosystem services, increasing the 
capacity of marine biota to supply the growing needs of an additional 
2to 3 billion people by 2050. To meet this goal, rebuilding of depleted 
populations and ecosystems must replace the goal of conserving and 
sustaining the status quo, and swift action should be taken to avoid 
potential tipping points beyond which collapse may be irreversible”"®*. 

Here we examine the rates of recovery of marine species and habitats 
to date, and propose a tentative timeframe in which substantial recov- 
ery of marine life may be possible, should major pressures, including 
climate change, be mitigated. We broadly define recovery as the 
rebound in populations of marine species and habitats following losses, 
which can be partial (that is, 1O-50% increase), substantial (SO-90% 
increase) or complete (>90% increase)” (Table 1). 


Marine megafauna 

A number of megafauna species, including humpback whales and 
northern elephant seals, have recovered to historical baselines fol- 
lowing protection (Fig. 3c); however, rates of recovery depend on the 
life history of the species: some large whales may require more than 
100 years to recover, whereas smaller pinnipeds may only need several 
decades®* (Fig. 3c, d). Sea turtles have recovery timescales of up to 
100 years, although some populations have partially recovered much 
faster (for example, green turtles in Hawaii increased sixfold between 
1973 and 2016)”. Seabird populations typically require a few decades 
to recover**" (Fig. 3c, d). 


Fish stocks 

Recovery can also refer to achieving resilient populations that support 
the full extent of ecosystem functions and services that characterize 
them. For instance, fish stock recovery is often defined in terms of 
biomass increases to the level that enables the maximum sustainable 
yield (Bysy), which fisheries harvest theory predicts to be between 37% 
and 50% of the virgin biomass (B,), depending on the particular model 
used (Supplementary Information 2 and Supplementary Fig. 2.2). This 
range is consistent with an empirical estimate of B, for 147 exploited 
fish stocks, which found that contemporary Bysy values were 40% of Bo, 
onaverage, with a range of 26% to 46% across taxa”’. Reported recovery 
times to Bysy for overexploited finfish and invertebrate stocks range 
between 3 and 30 years® (Figs. 3, 4), which is consistent with palaeo- 
ecological reconstructions of prehistoric collapse and recovery of 


anchovy, sardine and hake stocks”, data from fisheries closures” 


and fish stock assessments”®. However, Bysy Should be considered to 
represent aminimum recovery target”, as it does not account for eco- 
system interactions, and might provide only limited resilience in the 
face of environmental uncertainty and change. 

Minimum recovery times of populations are set by the maximum 
intrinsic rate of population increase (r,,,,,), which is typically higher 
than observed rates, resulting in longer recovery times’”’*. Recovery 
rates also depend on the fishing pressure imposed on the stock; for 
example, rebuilding depleted populations to By, may take less thana 
decade, if fishing mortality is rapidly reduced below Fysy. Longer recov- 
ery times are expected if fishing pressure is reduced more slowly**” 
(Fig. 4). Recovery for longer-lived, slow-growing species such as most 
elasmobranchs (sharks, rays and skates), depleted coral reef fish and 
deep-sea species may take much longer®”’. 


Coastal habitats 

The recovery of coastal habitats after the removal of stressors or fol- 
lowing active restoration of the habitat typically occurs ona similar 
timescale as fish stock recovery, less than a decade for oyster reefs® 
and other invertebrate populations (Supplementary Information 3), 
and kelp-dominated habitats*’’, between one to two decades for 
saltmarsh® and mangrove habitats, and one to several decades 
for seagrass meadows® (Fig. 3d). Deep-sea corals and sponges grow 
more slowly and recovery times from trawling disturbance or oil 
spills may range from 30 years to more than a century***”. Recovery 
timescales of coral reefs that are affected by local stressors range 
froma few years to more than a decade (Fig. 3d). However, recovery 
from severe coral bleaching has taken well over a decade and will 
slow in the future as ocean warming shortens the interval between 
bleaching events”, with an associated steep reduction in coral-reef 
recruitment®®, 

In summary, available data suggest that many marine species and 
habitats require one to three decades to approach undisturbed or refer- 
ence abundance ranges and fish stock biomass that supports maximum 
sustainable fish catches after removal of the causes of decline*** ””, 
with longer recovery times required for some slow-growing groups» 
(Fig. 3). 


Recovery times 
The time that is required to rebuild components of marine life depends 
on the extent of previous declines, which are often substantial. The 
reduction in species abundance and biomass relative to predisturbance 
baselines averages about 44 and 56%, respectively, across affected 
marine ecosystems”. Similarly, the Living Blue Planet Report esti- 
mated a 49% decline in the abundance of marine animal populations 
between 1970 and 2012”, although many species and habitats have 
declined further since®”**. Moreover, although the maximum rates of 
recovery of marine populations typically range from 2 to 10% per year”” 
(Fig. 3c), rates slow downas carrying capacity is approached”’. Assum- 
ing areported average annual recovery rate of 2.95% (95% confidence 
interval, 2.42-3.41%) across marine ecosystems” and a characteristic 
rebuilding deficit of about 50% of predisturbance baselines®’, we pro- 
visionally estimate that the average time to reach 90% of undisturbed 
baselines (that is, achieve substantial recovery) would be about 21 years 
(95% confidence interval, 18-25 years) (Fig. 3d). However, the expecta- 
tion of an average recovery time of about two decades is compromised 
by the fact that many species and habitats continue to decline and 
some pressures, such as climate change and plastic pollution, are still 
increasing (Fig. 1). Thus, substantial (SO-90%), rather than complete 
(>90%), recovery may bea more realistic target for rebuilding marine 
life in the short term. 

Based onthe case studies examined, we provisionally propose three 
decades from today (2050) as a target timeline for substantial (that 
is, 50-90%) recovery of many components of marine life (Table 1), 
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Table 1| Scenarios conducive to achieving the best aspirational outcomes towards rebuilding marine life 


Rebuilding Saltmarshes Mangroves Seagrass Coral reefs Kelp Oysterreefs Fisheries Megafauna Deep-sea 
wedges habitats 
Protect Low Low Low Low Low High Critical Critical Critical 
species 
Harvest Low Critical Low High High Critical Critical Critical Critical 
wisely 
Protect Critical Critical Medium High Medium Critical High High Critical 
spaces 
Restore Critical Critical High Medium Medium Critical Medium Medium Medium 
habitats 
Reduce Medium Medium Critical Critical Critical High Medium Medium High 
pollution 
Mitigate High High High Critical High High High High High 
climate 
change 
Recovery Substantial to Substantialto Substantialto Partial to Substantialto Substantialto Substantialto Substantial Partial to 
targets by complete complete complete substantial complete complete complete substantial 
2050 
Key Actors Government, Government, Government, Government, Government, Government, Government, Government, International 
civilsociety and civil society civil society tourism fishers fishers fishers fishers seabed 
NGOs. and NGOs. and NGOs. operators, organizations organizations, organizations organizations, authority, 
fishers and civil NGOs and and civil NGOs and state and 
organizations, society. civil society. society. civil society. federal 
civil society governments, 
and NGOs. mining/ 
exploration 
companies, 
civil society 
and fishing 
industry. 
Key Actions Protection Protection, Reduce Ambitious Restoration Protect Reduce Protect, Regulate 
of remaining provide nutrient reduction in requires remaining overfishing, reduce industries that 
saltmarshes, alternative inputs, greenhouse removal reefs, bycatch and bycatch, operate in 
providing livelihoods for protect,avoid gasemissions. of excess prohibition incidental reduce the deep sea. 
sources of dependent physical Reduce herbivores, of natural mortality,ban incidental Ban deep-sea 
sediment, communities, impacts, excess by rebuilding reefharvests, destructive mortality fishing and 
potentially providespace andconduct sediment their improve water fishing (ship strikes, | imposea 
planting forlandward __ restoration and nutrient predators, quality and practices, entanglement, moratorium 
native species, migration, projects. inputs, anda restore reefs. protect ghost gear), on deep-sea 
providing space restore improve reduction spawning/ reduce mining until 
for landward hydrological water quality, in sediment breeding pollution technologies 
migration connections, protect loads on areas and (noise, debris, free of impact 
and restoring maintain reefs, rebuild rocky nursery chemical), are available. 
hydrological sediment food webs substrates grounds, protect Improve 
connections. supply and and restore and kelps. and remove breeding/ environmental 
restore damaged perverse haul-out sites, safety of 
damaged reefs. incentives. safeguard oil and gas 
forests. migration operations. 
routes and Develop 
reduce facilities 


competition to test 
with fisheries. technologies 


before 
real-ocean 
deployment. 
Key Blue Carbon BlueCarbon BlueCarbon — Linktocoastal Emerging Linkto water Sustainable Marine wildlife High 
Opportunities and coastal and coastal and coastal defence, food role in Blue quality seafood, tourism, percentage 
defence defence defence provisionand Carbon, improvement, MSC-certified cultural of unique, 
strategies strategies strategies biodiversity water biodiversity fisheries, benefitsand unexplored 
against storms againststorms againststorms strategies. quality and and coastal develop ethics. habitats and 
and sea-level and sea-level and sea-level biodiversity protection sustainable new species, 
rise, links to rise, links to rise, links to strategies. strategies. aquaculture potential 
management management management to reduce for novel 
forenhancing forenhancing for enhancing pressure on products 
water quality, water water wild stocks. important 
food provision quality,food quality, food in fighting/ 
and biodiversity provisionand provision and preventing 
strategies. biodiversity biodiversity disease. Huge 
strategies. strategies. carbon-sink 
potential. 


Continued 
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Rebuilding Saltmarshes Mangroves Seagrass Coral reefs Kelp Oysterreefs Fisheries Megafauna Deep-sea 

wedges habitats 

Key Benefits = Improved Improved Protect Provision Enhanced Improved Improved Increased Huge 
fisheries, fisheries, shoreline from of fish, fisheries. water quality, quality and connectivity potential for 
protection biodiversity erosion and protection increased quantity among discoveries 
from sea- and coastal rebuilding from sea-level habitat, of seafood ocean basins, andnew 
level rise and defence, biodiversity rise and recreational supply. enhanced resources. 
storm surges, recreation and fisheries. storm surges, and cultural nutrient Avoidance of 
recreational and cultural recreational benefits, food cycling irreversible 
and cultural benefits. and cultural sources. and ocean damage. 
benefits, benefits. productivity. 
hunting. 

Roadblocks Many Alternative Infrastructure Dependence Climate Poor Cumulative Losses due Slow and 
saltmarshes are landusesand (forexample, onclimate change atthe management impacts toextinction, uncertain 
filled, landward infrastructure, areas change edge of the of fisheries from fishing, | continued recovery and 
migration lack of occupied by trajectories, equatorial onremaining — pollution, impacts from — success of, 
impeded alternative harbours), mortality range of kelp reefs, habitat ship strikes, hugely costly 
because of livelihoods severe and with ocean species, high degraded alterations, pollution, restoration, 
infrastructure, and frequent warming, herbivore habitats, changing habitat which will be 
not enough incentives for heat waves ocean pressure and _ restoration distribution alterations, extremely 
sediment communities, withclimate acidification | sediment costs, ranges, changing difficult and 
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Actions include rebuilding wedges, assessment of the maximum recovery targets by 2050 if these wedges are fully activated, as well as key actors, opportunities, benefits, roadblocks and 
remedial actions to rebuild different components of marine life (priority increases from low to critical). See Supplementary Information 3 for details. 


recognizing that many slow-growing, severely depleted species and 
threatened habitats may take longer to recover (Fig. 3), and that natural 
variability may delay recovery further (Fig. 4). 

Importantly, achieving substantial recovery by 2050 requires that 
major pressures are mitigated soon, including climate change under 
the Paris Agreement. Climate change affects the demography, phenol- 
ogy and biogeography of many marine species and compromises the 
productivity of marine ecosystems’ *”)” (Fig. 4). Current impacts 
of realized climate change on many coral reefs” raise concerns about 
the future prospects of these ecosystems (Table 1). If we succeed in 
mitigating climate change and other pressures, we may witness a trend 
change froma previous steep decline to stabilization and, in many 
cases, substantial global recovery of marine life in the twenty-first 
century (Figs. 1-4). 


Aroadmap to recovery 


Steps taken to rebuild marine life to date have involved a process of 
trial and error that delayed positive outcomes (for example, reduction 
of excessive nutrient inputs in the EU and United States*!*”), but that 
generated know-how to cost-effectively propel subsequent efforts 
at scale. Improved ocean stewardship, as required by UN SDG 14, isa 
goal shared across many nations, cultures, faiths and political systems, 
occupying a more-prominent place in the agendas of governments, 
corporations, philanthropists and individuals than ever before”. 
This provides a window of opportunity to mitigate existing pressures 
over the next decade while supporting global initiatives to achieve 
substantial recovery of marine life by 2050 (Table 1 and Supplemen- 
tary Information 3). We are at a point at which we can choose between 
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Fig. 4 | Recovery projections for assessed fish stocks. a, Trajectories of 
exploited fish stock biomass (B) relative to the biomass supporting the 
maximum sustainable yield (Bysy; the ratio of which is denoted B/Bysy) over 
time based onthe scientific assessment of 371 globally distributed fish stocks 
inthe RAM Legacy Stock Assessment Database (version 4.44). Open circles 
indicate the biomass-weighted global average of stock B/Bysy, asterisks 
represent years without sufficient data, red and green lines represent four 
idealized future scenarios (Bysy values were taken from stock assessments 
where available and estimated as 50% of the maximum historical biomass 
otherwise; see Supplementary Information 2). Grey shading represents the one 
s.d. range of the simulations. Purple diamonds give the proportion of the 
database used in the calculation of B/Bysy for each year. b, Frequency 


a legacy of a resilient and vibrant ocean or an irreversibly disrupted 
ocean, for the generations to follow. 

Some of the interventions required to rebuild marine life have already 
been initiated, but decadal time lags suggest that the full benefits are 
yet to be realized*>*°29*”48"", Because most policies to reduce local pres- 
sures and prompt recovery of marine life were introduced after the 
1970s (Figs. 1, 2), itis only now that comprehensive benefits (Fig. 3) are 
becoming evident at a larger scale. Similarly, as most current MPAs are 
less than 10 years old (Fig. 2), their full benefits, which increase with the 
age of the reserve, are yet to be realized”, particularly for MPAs that 
are properly managed and enforced”. 


Recovery wedges 
Thereis no single solution for achieving substantial recovery of marine 
life by 2050. Rather, recovery requires the strategic stacking of anum- 
ber of complementary actions, here termed recovery wedges, each 
of which will help to increase the recovery rate to reach or exceed the 
target of 2.4% increase per year across different ecosystem components 
(Table 1and Supplementary Information 1, 3, 4). These wedges include 
protecting vulnerable habitats and species, adopting cautionary har- 
vesting strategies, restoring habitats, reducing pollution and mitigating 
climate change (Table 1 and Supplementary Information 1, 3, 4). The 
strength of the contribution of each of these wedges to the recovery 
target can be expected to vary across species and ecosystems. For 
instance, mitigating climate change is the critical wedge to set coral 
reefs on a recovery trajectory, whereas improved habitat protection 
and fisheries management are the critical wedges for the recovery of 
marine vertebrates and deep-sea habitats (Table 1 and Supplementary 
Information 3). 

Ongoing efforts to remove pressures on marine life from anthropo- 
genic climate change, hunting, fishing, habitat destruction, pollution 
and eutrophication (Fig. 1) must be expanded and made more effective 


46 | Nature | Vol580 | 2 April 2020 


distributions for estimated recovery times to Bysy for 172 stocks that are 
currently depleted to below Bysy. Projections refer to three scenarios, 
corresponding to no fishing, fishing at 60% or 90% of fishing pressure 
associated with the maximum sustainable yield (Fyysy). Projections show that 
under various scenarios of reduced fishing pressure (F< Fysy) and different 
productivity regimes, the majority of fish stocks could recover to Bysy with 
high probability before 2040. Recovery to virgin biomass (By) would take much 
longer. Solid lines indicate the median and hashed lines the mean estimate of 
years to recovery. Productivity for each stock in b was fixed to the mean stock- 
specific historical productivity. See Supplementary Information 2 for details of 
datasources and methods. 


(Table 1). Anew framework to predict risks of new synthetic chemicals 
is required to avoid circumstances in which industry introduces new 
chemicals faster than their risks can be assessed. Challenges remain 
for persistent legacy pollutants (for example, CO,, organochlorines 
and plastics) that are already added to the atmosphere and oceans, 
the removal of which requires novel removal technologies and pro- 
tection of long-term sinks, such as marine sediments, to avoid their 
remobilization. 

MPAs represent a necessary and powerful recovery wedge across 
multiple components of the ocean ecosystem, spanning from coastal 
habitats to fish and megafauna populations (Table 1). The current 
growth of MPAs (Fig. 2, Supplementary Video 1) is currently on track 
to meet ambitious targets”, 10% of ocean area protected by 2020,30% 
by 2037 and 50% by 2044. Many fish stocks could recover to Bysy by 
2030, assuming global management reforms couple the use of closed 
and protected areas to measures that reduce overall fishing pressure 
and collateral ecosystem damage that are adapted to the local context 
(Fig. 4 and Table 1). However, projected climate impacts on ocean pro- 
ductivity and an increase in extreme events” can delay recovery and, 
depending on emission pathways, may prevent recovery of some com- 
ponents altogether (Fig. 4). The current focus on quantitative targets 
of the percentage of the ocean area that is protected has prompted 
concerns over the quality and effectiveness of MPAs”. Although 71% 
of assessed MPAs have been successful in enhancing fish populations, 
the level of protection is often weak (94% allow fishing’”°), and many 
areas are undermined by insufficient human and financial capacity’. 
Improving the effectiveness of MPAs requires enhanced resourcing, 
governance, level of protection’®°! and siting to better match the 
geography of threats!” and to ensure desired outcomes. 

The current surge in restoration efforts (Fig. 2 and Supplementary 
Video 2) can, if sustained, be an instrumental recovery wedge to meet 
rebuilding targets for marine habitats by 2050 (Table 1). For instance, 


assuming a mean project size of 4,197 ha (ref. '™*), restoring mangroves 
to their original extent of 225,000 km? by 2050 would require the ini- 
tiation of 70 projects per year. This is not unrealistic, as realization of 
the benefits, suchas reducing storm damage in low-lying areas*011, 
encourages further growthin restoration efforts (Fig. 2and Supplemen- 
tary Video 2). Past coastal restoration projects have reported average 
success rates ranging from 38% (seagrass) to 64% (saltmarshes and 
corals)!°*; however, reasons for failure are well understood®°07?, 
which should improve future outcomes. Much can be learned from 
increased reporting of failed attempts, because the published litera- 
ture may be biased towards successful restoration projects’. Emerg- 
ing technologies are now being developed to restore coral species 
in the presence of climate change™°™, although long-term testing is 
required before their effectiveness and lack of negative consequences 
are demonstrated. Kelp restoration at a national scale inJapan provides 
asuccessful model, rooted in cultural practices, for linking restoration 
to sustainable fishing (Supplementary Information 1). More broadly, 
these practices recognize that sustainable harvest of marine resources 
ought to be balanced by broader restoration actions embedded ina 
socio-ecological context, including reducing greenhouse gas emis- 
sions, restoring habitats, removing marine litter or managing hydro- 
logical flows to avoid hypoxia (Supplementary Information 1). These 
restoration experiences (Supplementary Information 1) also show 
that involvement of local communities is essential, because of their 
economic dependence, commitment to place and ownership”. 

Removing pollution is a critical recovery wedge for seagrass mead- 
ows, coral reefs and kelp forests (Table 1). Three decades of efforts to 
abate coastal eutrophication have provided valuable knowledge on how 
actionable science can guide restoration successes", Additional 
interventions (for example, restoring hydrological flows or rebuilding 
oyster reefs) can catalyse the additional removal of nutrients while 
improving biodiversity"’. Seaweed aquaculture can help to allevi- 
ate eutrophication and reduce hypoxia’?"*, Nutrient reduction has 
the additional benefit of locally reducing coastal acidification’ and 
hypoxia” directly and indirectly through the recovery of seagrass mead- 
ows. Reducing sulfur dioxide precipitation, hypoxia, eutrophication, 
emissions and runoff from acidic fertilizers also helps to reduce acidi- 
fication of coastal waters”, Large-scale experiments in anoxic basins 
of the Baltic Sea, for example, have shown that treatment of sediments 
with phosphorus-binding agents helps to break biogeochemical feed- 
back loops that keep ecosystems in an alternative anoxic stable state”®. 

Oil spills from oil tankers should decline further with the incoming 
International Maritime Organisation (IMO) requirement (13F of Annex 
lof MARPOL) for double hulls in new large oil tankers, although deep- 
water drilling, illustrated by the catastrophic Deepwater Horizon spill 
in 2010, and increasing risks of oil spills from future oil drilling and 
oil tanker routes in the Arctic"’’ present new challenges.- Noise pollu- 
tion from shipping and other industrial activities, such as drilling, pile 
driving and seismic surveys, should be reduced’. Similarly, world- 
wide efforts to reduce or ban single-use plastic (initiated in developing 
nations), taxes on plastic bags, deposits and refunds on bottles, and 
other market-based instruments are being deployed to reduce marine 
litter, while providing incentives to build a circular economy for exist- 
ing plastics while developing safer materials. 


Roadblocks 

A number of roadblocks may delay or prevent recovery of some of 
the critical components of marine life (Table 1). These include natural 
variability and intensification of environmental extremes caused by 
anthropogenic climate change (Fig. 4), unexpected natural or social 
events, and a failure to meet commitments to reduce existing pres- 
sures and mitigate climate change. In addition, the growing human 
population, which will probably exceed 9 billion individuals by 2050, 
will create additional demands for seafood, coastal space and other 
ocean resources. Accordingly, if all necessary recovery wedges are 


stacked, a 2050 target of substantial to complete recovery (that is, 
50-100% increase relative to the present) for most rebuilding compo- 
nents appears realistic and achievable (Table 1). Partial to substantial 
(10 to >50%) recovery can be targeted for deep-sea habitats, where slow 
recovery rates lead to amodest rebuilding scope by 2050, and for coral 
reefs, where existing and projected climate change severely limits the 
rebuilding prospects” (Table 1). 

A major roadblock to recovery for intertidal habitats, such as man- 
groves and saltmarshes, is their conversion to urban areas, aquaculture 
ponds orinfrastructure (Table 1). However, even in large cities, such as 
New York and Shenzen, some restoration of degraded habitats has been 
achieved (Supplementary Information 1). Incentives to develop alterna- 
tive sources of livelihood, relocate landholders, mediate land-tenure 
conflicts’ and improve land-use planning can release more habitat for 
coastal restoration (Table 1). Tools are emerging to prioritize sites for 
restoration based on past experience and a broad suite of biophysical 
and socio-economic predictors of success’”°. Reduced sediment supply 
due to dam construction in watersheds” is also animportant challenge 
forthe recovery of salt marshes and mangroves, and these challenges 
are exacerbated by sea-level rise and climate change (Table 1). However, 
these habitats may be less vulnerable than previously thought”, with 
a recent assessment concluding that global gains of 60% of coastal 
wetland area are possible under sea-level rise”. By contrast, enhanced 
sediment load from land clearing is often responsible for losses of 
nearshore coral reefs and hinders their capacity to recover from coral 
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bleaching’. 


Overcoming the climate change roadblock 
Climate change is the critical backdrop against which all future rebuild- 
ing efforts will play out. Current trajectories of greenhouse gas emis- 
sions lead to warming by 2100 of 2.6 to 4.5 °C above preindustrial levels, 
far exceeding the long-term goal of the Paris Agreement (holding the 
increase in global average temperature to well below 2 °C above prein- 
dustrial levels)'*. Much stronger efforts to reduce emissions" are 
needed to reduce the gap between target emissions and projected emis- 
sions under the present voluntary NDCs”a challenging but not impos- 
sible task’. Efforts to rebuild marine life need to consider unavoidable 
impacts brought about by ocean warming, acidification and sea-level 
rise already committed by past emissions, even if the climate mitiga- 
tion wedge, represented by the Paris Agreement, is fully implemented. 
These changes include projected shifts in habitats and communities 
at subtropical-tropical (coral to algal turf and seaweed), subtropical— 
temperate (kelp to coral and urchin barrens, saltmarsh to mangrove) 
temperate-Arctic (bare to kelp, ice fauna to pelagic) and intertidal 
(coastal squeeze) boundaries” “, propelled by species displacements 
and mass mortalities from future heat waves” >. Mapping the areas 
where the likelihood of these transitions is high can help to prioritize 
where and how restoration interventions should be deployed”°. For 
instance, conserving and restoring vegetated coastal habitats will help 
to defend shorelines against increased risks from sea-level rise while 
helping to mitigate climate change**™. Well-managed MPAs may 
help to build resilience to climate change*. However, many of them are 
already affected by ocean warming and further climate change may 
potentially compromise their performance in the future”. 
Rebuilding coral reefs carries the highest risk of failure (Table 1), as 
cumulative pressures (for example, overfishing and pollution) that 
drove their historical decline are now increasingly compounded by 
warming-induced bleaching”. The IPCC (Intergovernmental Panel on 
Climate Change) projects that global warming to 1.5 °C above preindus- 
trial levels will result in very high risks and losses of coral reefs” unless 
adaptation occurs faster than currently anticipated. A recent study” 
shows that while coral bleaching has increased in frequency and inten- 
sity in the last decade, the onset of coral bleaching is now occurring 
at significantly warmer temperatures (around 0.5 °C) than previously, 
suggesting that the remaining coral populations now have a higher 
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thermal threshold for bleaching, due to a decline in thermally vulner- 
able species and genotypes and/or acclimatization’**. However, the 
capacity to restore coral reefs lags behind that of all other marine habi- 
tats, because coral-reef restoration efforts typically have a very small 
footprint, and are expensive and slow. Coral restoration often fails 
because the original causes of mortality remain unchecked, and despite 
decades of effort (Fig. 2), only tens of hectares have been regrown so far. 
Our growing knowledge of ecological processes in coral reefs provides 
opportunities to catalyse recovery by reducing multiple pressures while 
repairing key processes, including herbivory and larval recruitment)”. 
Mitigating the drivers of coral loss, particularly climate change, and 
developing innovative approaches to restoration within this decade 
are imperative to revert coral losses at scale". Efforts are underway 
to find corals that are resistant to the temperatures and acidity levels 
expected by the end of the twenty-first century, to understand the 
mechanisms of their resistance and to use ‘assisted evolution’ to engi- 
neer these characteristics into other corals”°"”. However, these efforts 
are in their infancy and their benefits currently unproven. 

Overall, the societal benefits that would accrue from substantially 
rebuilding marine life by 2050 will depend on the mitigation of green- 
house gas emissions and on the development of efficient CO, capture 
and removal technologies to meet or, preferably, exceed the targets 
of the Paris Agreement. 


Necessary investments and expected returns 

Substantial rebuilding of marine life by 2050 requires sustained effort 
and financial support (Supplementary Information 4), with an esti- 
mated cost of at least US$10-20 billion per year to extend protection 
actions to reach 50% of the ocean space””’ and substantial additional 
funds for restoration. This is comparable to establishing a global 
MPA network that conserves 20-30% of the ocean (US$5-19 billion 
annually”?°), Yet the economic return from this commitment will 
be considerable, around US$10 per US$1 invested and in excess of one 
million new jobs”*”*°. Ecotourism in protected areas provides 4-12 
times greater economic returns than fishing without reserves” (for 
example, AUS$5.5 billion annually and 53,800 full time jobs in the Great 
Barrier Reef). Rebuilt fisheries alone could increase the annual profits 
of the global seafood industry by US$53 billion”®. Conserving coastal 
wetlands could save the insurance industry US$52 billion annually 
by reducing storm flooding”’, while providing additional benefits of 
carbon sequestration, income and subsistence from harvesting, and 
from fisheries supported by coastal wetlands*”’. 

Aglobal rebuilding effort of exploited fish stocks could increase fish- 
ing yields by around 15% and profits by about 80%*” while reducing 
bycatch mortality, thereby also helping to promote recovery in non- 
target species”. Rebuilding fish stocks can be supported by market- 
based instruments, such as rationalizing global fishing subsidies”, 
taxes and catch shares*®, to end perverse incentives’? and by the growth 
of truly sustainable aquaculture to reduce pressure on wild stocks’. 
Whereas most regulatory measures focus on commercial fisheries, 
subsistence” and recreational” fishing are also globally relevant and 
need to be aligned with rebuilding efforts to achieve sustainability. 


Call to action 

Rebuilding marine life requires a global partnership of diverse inter- 
ests, including governments, businesses, resource users and civil soci- 
ety”?3° aligned around an evidence-based action plan supported bya 
sound policy framework, ascience and educational plan, quantitative 
targets, metrics for success anda business plan. It also requires leader- 
ship to assemble the scientific and socio-economic knowledge and the 
technologies required to rebuild marine life and the capacity to deploy 
them. A concerted global effort to restore and protect marine life and 
ecosystems could create millions of new—and in many cases—well- 
paying jobs’”’””. Thus, commitments of governments, which are 
required to meet the UN SDGs by 2030, need to be supported and 
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reinforced by commitments from society, non-government organiza- 
tions (NGOs) and other agents, suchas philanthropic groups, corpora- 
tions and industry (Supplementary Information 4). The sectors that 
operate in the ocean spaces, which bear considerable responsibility 
for the losses thus far experienced and, in many cases, are likely to be 
the main beneficiaries of efforts to rebuild marine life, must change 
their ethos to commit to a net positive conservation impact as part of 
their social license to operate in the ocean space. The use of the ocean 
by humans should be designed for net positive conservation impact, 
creating additional benefits’* that increase prosperity and catalyse 
political will to deploy further efforts in a positive feedback spiral of 
ocean bounty. 

The long-term commitment to rebuilding marine life requires a 
powerful narrative, supported by scientific evidence that conveys its 
feasibility in the face of climate change and a growing human popula- 
tion, its alignment with societal values, and its widespread societal 
benefits. Growing numbers of success stories could shift the balance 
froma wave of pessimism that dominated past scientific narratives 
of the future ocean*’”""2? to evidence-based ‘ocean optimism’”” (for 
example, #oceanoptimism in social media), conveying solutions and 
opportunities for actions that help to drive positive change™°. This 
optimism must be balanced with transparent and robust communica- 
tion of the risks posed by relevant pressures that are yet to be mitigated. 

Rebuilding marine life will benefit from nations declaring, analogous 
to the Paris Agreement on climate change, NDCs towards rebuilding 
marine life’. NDCs aimed at rebuilding marine life will be essential for 
accountability, auditing milestones and forecasting success in reaching 
goals. NDCs caninclude both commitments for action within national 
Economic Exclusive Zones, as well as a catalogue of actionable oppor- 
tunities available to investors, corporations and philanthropists”. 

The global policy framework required to rebuild marine life is largely 
in place through existing UN mechanisms (targets to be adopted in 
2020 under the Global Biodiversity Framework of the CBD, SDGs and 
Paris Agreement of the UNFCC), if their most ambitious goals are imple- 
mented, along with additional international conventions such as the 
Bonn Convention on the Conservation of Migratory Species of Wild 
Animals, the Moratorium on Commercial Whaling of the International 
Whaling Commission (1982), Ramsar Convention on Wetlands of Inter- 
national Importance and CITES, among others. High-level coordination 
among all UN instruments and international policies addressing the 
oceans, including the high seas, is needed. 

The UN initiated, in 2018, an Intergovernmental Conference to reach 
anew legally binding treaty to protect marine life in the high seas by 
2020. This proposed treaty could enhance cooperation, governance 
and funds for conservation and restoration of high-seas and deep-sea 
ecosystems damaged or at risk from commercial interests™. This 
mandate would require funding of around US$30 million annually, 
which could be financed through long-term bonds in international 
capital markets or taxes on resource extraction™. Internationally 
agreed contributions will also be required, because populations of 
many species are shared across Exclusive Economic Zones of mul- 
tiple nations. This approach could follow the model of the Regional 
Fisheries Management Organizations, bringing together nations to 
manage shared fish stocks that straddle national waters and the high 
seas", For example, in September 2010 the Convention for the Pro- 
tection of the Marine Environment of the North-East Atlantic (OSPAR) 
established the world’s first MPA network on the high seas covering 
286,200 km? (ref. 7). 

Rebuilding marine life will also require active oversight, participa- 
tion and cooperation by local, regional and national stakeholders. A 
readiness and the capacity to implement recovery wedges differs across 
nations, and cooperation to rebuild marine life should remain flexible to 
adapt to variable cultural settings; locally designed approaches may be 
most effective? (Supplementary Information 1). Past failures insome 
nations can inform new governance arrangements to avoid repeating 


the same mistakes elsewhere. Rebuilding marine life should draw on 
successful marine policy formulation, management actions and tech- 
nologies to nurture a learning curve that will propel future outcomes 
while reducing cost!>!°”"*, For instance, many developed nations have 
already implemented nutrient reduction plans; however, fertilizer 
use is rising globally, supported mainly by demands from developing 
nations that also continue to develop their shorelines. Adopting the 
measures now in place in developed nations to increase nitrogen-use 
efficiency in South and East Asia could lower global synthetic fertilizer 
use by 2050, even under the increased crop production required to 
feed a growing population™. 

Calls for international assistance to support recovery, whether it 
is for coastal wetlands to reduce risks of damages from natural disas- 
ters! or marine life generally’, should include assistance to improve 
governance and build institutional capacities. However, the capacity 
of both developed and developing nations to deploy effective recov- 
ery actions is already substantial. Mangrove restoration projects are 
considerably larger and cheaper but similarly successful (about 50% 
survival reported) in developing nations compared with developed 
countries’, and small-island states are showing growing leadership in 
response to plastic pollution and the marine impacts of climate change 
(https://www.aosis.org/). However, many developing countries need 
particularly high levels of investment to conserve and restore habitats 
that protect populations at risk in low-lying coastal areas, which could 
be financed through international climate change adaptation funds™. 
Currently, the UN’s Green Climate Fund has mobilized US$10.3 billion 
annually to assist developing countries to adapt to climate change, with 
a goal of US$100 billion per year in 2020 (https://www.greenclimate. 
fund/how-we-work/resource-mobilization). Allocating a sizeable frac- 
tion of these funds to developing countries for the conservation and 
restoration of ‘blue infrastructure’ (for example, saltmarshes, oyster 
and coral reefs, mangroves and seagrass beds) could increase the resil- 
ience of coastal communities to climate change and to extreme events 


while improving their livelihoods’. 


Conclusion 


Based on the data reviewed here, we conclude that substantial rebuild- 
ing across many components of marine life by 2050 is an achiev- 
able Grand Challenge for science and society. Meeting this challenge 
requires immediate action to reduce relevant pressures, including 
climate change, safeguarding places of remaining abundance, and 
recovering depleted populations, habitats and ecosystems elsewhere. 
This will require sustained perseverance and substantial commitment 
of financial resources, but we suggest that the ecological, economic 
and social gains will be far-reaching. Success requires the establishment 
of acommitted and resilient global partnership of governments and 
societies aligned with this goal, supported by coordinated policies, 
adequate financial and market mechanisms, and evolving scientific and 
technological advances that nurture a fast learning curve of rebuild- 
ing interventions. Meeting the challenge of substantially rebuilding 
marine life would bea historic milestone in humanity’s quest to achieve 
a globally sustainable future. 
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Conservation laws are deeply related to any symmetry present ina physical system’”. 
Analogously to electrons in atoms exhibiting spin symmetries’, it is possible to 
consider neutrons and protons in the atomic nucleus as projections of a single 
fermion with an isobaric spin (isospin) of t= 1/2 (ref. *). Every nuclear state is thus 
characterized by a total isobaric spin 7 and a projection 7,—two quantities that are 
largely conserved in nuclear reactions and decays**. A mirror symmetry emerges from 
this isobaric-spin formalism: nuclei with exchanged numbers of neutrons and 
protons, knownas mirror nuclei, should have an identical set of states’, including their 
ground state, labelled by their total angular momentum/and parity 7. Here we report 
evidence of mirror-symmetry violation in bound nuclear ground states within the 
mirror partners strontium-73 and bromine-73. We find that a/”=5/2° spin assignment 
is needed to explain the proton-emission pattern observed from the T =3/2 isobaric- 
analogue state in rubidium-73, which is identical to the ground state of strontium-73. 


Therefore the ground state of strontium-73 must differ from its/”=1/2 mirror 
bromine-73. This observation offers insights into charge-symmetry-breaking forces 
acting in atomic nuclei. 


Determining the properties and structure of ?Rb was primarily moti- 
vated by the role this nucleus plays in the rapid proton capture process® 
that is thought to drive thermonuclear type-I X-ray bursts’”’. Previous 
attempts to detect Rb directly have not been successful, owing to 
its very short half-life, which arises as a result of its proton-unbound 
ground state". In order to characterize the structure of states in ?Rb, 
the nucleus was populated via the B decay of the longer-lived “Sr, a 
technique that has proved effective for several other proton-unbound 
nuclei??, 

The experiment was performed at the National Superconducting 
Cyclotron Laboratory (NSCL), which provided a mixed beam of radi- 
oactive nuclei containing Sr, derived from fragmentation of *Mo 
(see Methods). Each ion was identified (shown in Fig. 1) before pass- 
ing through a stack of silicon detectors where they were stopped ina 
double-sided segmented silicon implantation detector to study their 
subsequent decays. The segments on the front and back of the detec- 
tor are perpendicular to each other, enabling spatial localization of 
the implantation event, which considerably reduces the background 
when searching for decay events. Over the course of the run, 427 Sr 
implantation events were unambiguously identified. In a given “Sr 
decay event, a positron (f") is emitted first, quickly followed by the 
emission ofa proton. The #" particles have a continuous energy distribu- 
tion, and usually leave only a small fraction of their energy inthe silicon 


detector. However, the emitted proton is stopped and deposits all of 
its energy into the silicon implantation detector. The summed energy 
deposited by the £* particles and protons results in an energy broaden- 
ing and shift in the charged-particle spectra (referred to as B summing). 
The implantation detector was surrounded by germanium detectors 
to measure y-rays in coincidence with these decay events to connect 
the de-excitation of the daughter nucleus to proton-emitting states. 

The time between the implantation of “Sr ions into the silicon 
detector and the subsequent charged-particle events is presented in 
Fig. 2a, and the data show good agreement with an exponential decay 
of one species and aconstant random background. The half-life of Sr 
was determined to be ¢,,. = 23.1+ 1.4 ms (all errors herein are 10) from 
the logarithmic-bin method", providing, to our knowledge, the best 
direct half-life measurement of Sr so far (see Extended Data Fig. 1 
and Methods). 

The energy spectrum of Sr B-delayed proton-emission events is 
shown in Fig. 2b, with the measured background denoted by the shaded 
blue overlay (see Methods). Two strong peaks are observed. The larg- 
est peak—found at 3.93 + 0.012 MeV-is attributed to protons emitted 
from the T= 3/2 isobaric-analogue state (IAS) in *Rb—referred to as 
™Rb*(IAS)—which leaves behind “Kr in its ground state. Correcting for 
6 summing (see Methods) gives a proton energy of 3.80 + 0.02 MeV, 
which is in agreement with the previous direct measurement? of 
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Fig.1| Particle identification plot. Particle identification was deduced from 
the energy loss of the incoming heavy ions passing through the first silicon 
detector inthe stack (AF) versus the time of flight of the ion after exiting the 
A1900 fragment separator. The raw uncalibrated signals from the detectors are 
presented, with the analogue-to-digital converter (ADC) channel for the 


3.75 + 0.04 MeV. The second strong peak is attributed to the branch- 
ing of “Rb*(IAS) decays to the ”Kr*(2*) excited state. This is confirmed 
by the observation of 709-keV y-rays that are promptly correlated 
with events in this second proton peak, shown by the inset to Fig. 2b. 
A peak in the y-ray spectrum at 511 keV is expected, because two 
511-keV y-rays are emitted in the annihilation of the #* with electrons. 
The observation of 10 coincident 709-keV y-ray events is consistent 
with almost all protons in this lower-energy peak proceeding to the 
J™=2' state, and <10% to the nearby 671-keV excited Kr*(0*) state at 
the 90% confidence limit. 

After accounting for the branching of the proton emission, the 
B-decay feeding to ?Rb*(IAS) was determined to be 63(3)%, as indicated 
in Fig. 3. This branching ratio, when combined with the predicted “Sr 
mass from the most recent atomic mass evaluations”, yields a value of 
log(ft)—a measure of the structural overlap between the initial and final 
states—of 3.45(6). This value of log(ft) is consistent with the conserva- 
tion of isobaric spin (that is, a A7=0 superallowed decay) between pure 
IASs”. It should be noted that some isobaric-spin mixing is expected in 
the A =73 atomic mass region (enabling AT=1 transitions) which would 
reduce the B branching to the IAS*”°, but our measurements cannot 
assess the degree of such mixing. 

The branching of 7Rb*(IAS) is unusual as compared to similar systems 
just below the A =73 mass region. In particular, B-delayed protons from 
the nuclei °Se and “Kr predominately populate either the ground state 
or the excited states of the daughter nucleus”, rather than fractionat- 
ing to the degree observed for “Rb*(IAS). In the case of °Se, which 
has a/”=3/2 ground state, the resulting decay of °As*(IAS) almost 
completely proceeds to the 0* ground state of “Ge. The opposite is true 
for Kr, for which the ground state and the corresponding *Br*(IAS) 
have/”=5/2°,and thus *Br*(IAS) decays almost exclusively to the first 
excited 2° state in °8Se by emitting a proton that carries away one unit 
of orbital angular momentum (€=1). 

For the nuclei involved in the B-delayed proton emission of “Sr, the 
structural situation is more intricate, and thus the standard shell model 
approach to the wavefunctions is not appropriate”. The 7=3/2 mirror- 
partner nucleus to Sr is “Br, which has a highly collective and complex 
structure; its ground-state spin assignment had been under debate for 
almost two decades. The rotational band structure of “Br suggests that 
it has a substantial deformation, and a ground state with/”=1/2 that 
is possibly triaxially shaped ”°. Isobaric-spin symmetry would lead us 


relative energy loss onthe vertical axis and the recorded time-to-digital 
converter (TDC) channel onthe horizontal axis. The colours represent the total 
number of counts found. The ion of interest, “Sr, is unambiguously isolated 
from neighbouring ions. 


to expect that “Sr should have a similarly highly collective structure, 
and therefore “Rb*(IAS) as well. The key issue in this discussion is the 
degree to which strontium and bromine differ. Br has two differently 
shaped, low-lying collective configurations, separated by only 27 keV, 
where the ground state has/”=1/2 and the excited configuration has 
J™=5/2°.\t requires only asmall degree of charge-symmetry breaking 
toinvert the sequence of these two structures and cause a breakdown 
of ground-state mirror symmetry. To this extent, the A = 73 isobar isa 
special case. 

To understand the continuum and deformation effects onthe open 
quantum system “Rb*(IAS), we adopted the Gamow coupled-channel 
(GCC) approach to model its decay”””’. In the framework of GCC, we 
used the Berggren basis, which is a complete ensemble that includes 
bound, Gamow and scattering states””””’. Hence, it provides the cor- 
rect outgoing asymptotic behaviour to describe the decay of particle- 
unbound resonances, and in essence enables the treatment of nuclear 
structure and reactions on the same footing. For this study, ?Rb*(IAS) 
was divided into a deformed core (Kr) plus a valence proton. The 
interaction between the deformed ”Kr core and the valence proton 
is represented by a Woods-Saxon potential with a quadrupole defor- 
mation f. 

The states in the 7=3/2 quartet along the A = 73 isobar are dominated 
by prolate deformation, and the daughter nucleus Kr is believed to 
showstrong shape-mixing effects with a predominately oblate-shaped 
ground state*° *. Therefore, the decay of “Rb*(IAS) to the ground-state 
rotational band of ”Kr might undergo a transition froma prolate to 
oblate shape, which would suppress the decay process by reducing 
the decay width, /,. As no shape-mixing effect can be incorporated into 
the GCC model, calculations were performed for different deforma- 
tions and spin assignments of “Rb*(IAS). The spin assignments were 
chosen based on the ground-state and first-excited-state spins of “Br. 
The values 6, = -0.34 and +0.4 were chosen for the oblate and prolate 
shapes, respectively, taken from experimental values for the ground 
states of “Brand 7Kr25*", 

On the basis of the predicted branching ratios for “Rb*(IAS) 
obtained from the GCC calculations, shown in Extended Data Table 1, 
the only spin assignment consistent with the data is/” = 5/2", when 
the Kr core is described with oblate-shaped deformation. In this 
case, ?Rb*(IAS) decays to the ground state of “Kr through the = 3 
channel, and tothe first excited/”=2* state through the ?=3 and @=1 
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Fig.2|Decay spectra of “Sr B-delayed proton emission. a, A plot of the 
correlation time between implantation of “Sr ions and their subsequent decay. 
The solid red curve shows the resulting fit for an exponential with a constant 
background. The individual components of the fit are shown by the blue 
dashed lines. The horizontal error bars correspond to the bin size, and the 
vertical error bars correspond to one standard deviation from counting. b, The 
time-gated (t<200 ms) energy spectrum of B + p decay events observed after 
the implantation of “Sr ions. The solid red curve is the best fit of the ?Rb*(IAS) 
decay peaks inthe spectrum (X04 = 1.5). The inset shows the y-ray spectrum 
gated onthe lower energy peak by the gateG, highlighting the de-excitation of 
”Kr*(2*). The shaded blue overlay shows the measured background; the 
horizontal error bars correspond to the bin size and the vertical error bars 
correspond to one standard deviation from counting. g.s., ground state. 


channels. The lower centrifugal barrier of the p-wave (£=1) component 
compensates for the smaller decay energy of the first excited J” = 2* 
state. Therefore, the decay widths for the ground state and the first 
excited state are roughly equivalent, even though the configurations 
of the calculation might be slightly different when considering the 
effect of shape mixing or changing calculation parameters. The shape- 
mixing effect is expected to have a similar impact on both transitions; 
it should roughly cancel out in the branching ratio. Nevertheless, 
the conclusion that the small admixture of low-angular-momentum 
components into the wavefunction has a major impact on the decay 
process is robust and indicates the important role of deformation 
on the fine structure of decays via proton emission. This feature has 
been observed before—though not to the same degree—in the proton 
emitters Eu and “'Ho®**, 

Isobaric spin is clearly not a perfect symmetry considering protons 
and neutrons have different electric charges”, their masses are slightly 
different (0.14%)** and their magnetic moments differ substantially 
in both magnitude and sign®’. Moreover, the nuclear force is stronger 
between neutron-proton (np) pairs than between like-nucleon pairs 
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Fig. 3 | Proposed level scheme. The level scheme details the B-delayed proton 
emission of ?Sr through the IAS in “Rb, providing the measured f branching to 
the IAS andthe subsequent proton branching. The ground state and first 
excited state of “Br are given in the inset, contrasted with the ground-state spin 
assignment of “Sr. 


(nnand pp)*°. With that in mind, it is not at all surprising that nuclear 
charge-symmetry breaking emerges from the small differences 
between nucleons and their interactions. Indeed, it is the robust nature 
ofisobaric-spin symmetry that is noteworthy, but those occasions when 
it breaks down offer a chance to learn more about the forces acting 
inside the atomic nucleus. 

The only other known case of isobaric-spin-symmetry breaking that 
results in different ground states between mirror nuclei (see Extended 
Data Fig. 2) is in the 7=1 mirror pair !°F/°N, in which “F is particle 
unbound and “Nis particle bound. This case of isobaric-spin-symmetry 
breaking is well explained as aconsequence of the Coulomb force, inan 
effect known as the Thomas-Ehrman shift*’ *. The Thomas-Ehrman 
shift comes into play for an unbound or loosely bound proton state 
(the valence proton of “F), because the wavefunction of the proton 
extends well beyond the surface of the nucleus, resulting in a differ- 
ent asymptotic behaviour than for the bound mirrored neutron (the 
valence neutron of N). Sucha mechanismis not immediately apparent 
in the case of ?Sr/”Br, and it may be that charge-symmetry-breaking 
forces need to be incorporated into the nuclear Hamiltonian to fully 
describe the presented results. 

In this Article we report the breakdown of mirror symmetry 
between the ground states of the particle-bound nuclei “Sr and “Br, 
which appear to have/"=5/2 and/"=1/2', respectively. This differ- 
ence probably comes about from an inversion of states, which in Br 
are only 27 keV apart. However, the consequences are appreciable 
because the f decay is strongly modified. This inversion could be 
due to small changes in the two competing shapes, particularly their 
degree of triaxiality, and the coupling to the proton continuum inthe 
IAS of ?Rb. In fact, in the exotic region of the chart of nuclides near 
Sr, where the limits of existence for proton-rich nuclei intersect 
with the N = Zline, there may be many more instances of mirror- 
symmetry breaking. 

To confirm the findings presented here, the ground-state spin of 
Sr should be directly measured through B-NMR or similar methods. 
A direct measurement of the mass of “Sr would also be informative in 
determining the degree to which isobaric-spin symmetry is broken. 
With existing facilities it will be difficult to make such direct determina- 
tions, because the yield of Sr atoms is low; however, as new facilities 
come online, studying such exotic nuclei should become possible, 
enabling continued investigations and a deeper understanding of the 
cracked isobaric-spin mirror. 
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Methods 


Experimental method 

The experiment used a primary Mo beamat an energy of 140 MeV per 
nucleon, undergoing projectile fragmentation ona 152.2 mg cm” beryl- 
lium target. Fragmentation products were then passed through the 
A1900 fragment separator, selecting for “Sr (ref. **). The secondary Sr 
beam was further purified by a factor of 4,500 after passing through 
the Radio Frequency Fragment Separator (RFFS)*. The remaining 
transmitted ions were then sent througha telescope stack* consisting 
of a1,041-pm silicon p-i-n detector, a variable-thickness aluminium 
degrader, a 989-m silicon p-i-n detector, a 520-m double-sided 
silicon strip detector (DSSSD) used for implantation, and another 996- 
um single-sided silicon strip detector followed by a plastic scintillator 
that was used for vetoing ions that were not implanted. The DSSSD was 
segmented with 40 front and 40 back strips, and the SSSD had 16 strips. 
The stack was surrounded by a high-purity germanium array—the Seg- 
mented Germanium Array (SeGA)-that was used to measure y-rays”’”. 

Nuclei ofinterest were implanted into the DSSSD detector, allowing 
for spatial and temporal correlations of implantation and decay events. 
These heavy ions were identified event-by-event using the measured 
energy loss in the 1,041-um silicon detector at the front of the stack, 
and the time of flight between the second silicon detector inthe stack 
and ascintillator located at the exit of the focal plane of the A1900. The 
resulting particle identification spectrum for the region of interest is 
shown in Fig. 1. lonidentification was confirmed by the observation of 
known y-rays in the region of interest. 

All of the detector signals were collected using a digital data- 
acquisition system** that used XIA Pixie-16 digitizers, which provided 
waveforms of the signals as well as timing and pulse-height data. The 
digitizers had 250-MHz ADCs and 100-MHz clocks that gave 10-ns 
timestamps. For the presented offline analysis, a 5-~s gate was used 
to determine prompt coincidences. The beam rate was about 6.5(1.3) 
particles per second. 

Because the energies associated with the implantation and decay 
event are several orders of magnitude different (GeV and MeV, respec- 
tively), the DSSSD detector was connected to dual-gain preamplifiers. 
The low-gain setting was used for implantation events and the high-gain 
setting for decay events. The DSSSD high-gain channels were energy 
calibrated with?“Th and “8Gd sources. SeGA was energy calibrated with 
amixed source of well known activity (primarily containing Eu), that 
was also used for determining an absolute efficiency curve. 


Experimental analysis 

After an ion was tagged by energy-loss and time-of-flight measure- 
ments, the ion-implantation event was localized within a pixel defined 
by the perpendicular front and back strips of the DSSSD with the largest 
charge deposition. Decay events were searched for within a 5-s correla- 
tion window, and only events that were within two neighbouring pixels 
(for a total of 24 surrounding pixels) of the implantation event were 
considered. All decay events were rejected if another implantation 
event occurred within 10 half-lives of the ion of interest, “Sr. 


Logarithmic-bin method 

The half-life was determined using the logarithmic-bin method, in which 
the ratio of the bin size to the correlation time (A¢/f) is constant, which 
is better suited for low-statistics analysis“. The resulting plot is shown 
in Extended Data Fig. 1, and the maximum logarithmic likelihood fit is 
given by the solid red curve. Because of the nature of this method, instead 
of correlating all events within a given time window after the implanta- 
tion—as was done for analysing the decay energy—only the first event 
after implantation was considered. Furthermore, the peak position of 
the probability distribution is directly related to the half-life of the spe- 
cies. Therefore, if other species are present then they will be well sepa- 
rated. Thus in the fit to the peak shown in Extended Data Fig. 1the events 


above 3 x 10° ns were not considered. The resulting fit of this distribution 
(as = 1,3) provided a better limit on the half-life of Sr, and so this is 
the half-life reported and used for the exponential in Fig. 2a. The half-life 
obtained from directly fitting the data in Fig. 2a is ¢,)=23.5+1.8 ms. 

It should also be noted that the observation of only one species, 
deduced from Extended Data Fig. 1, suggests that we are only consid- 
ering ground-state decays of “Sr. In the fragmentation process we do 
expect the population of excited states in the nucleus, and thus a poten- 
tial low-lying/”=1/2" state may be populated. Such states will predomi- 
nately decay by internal conversion (ejecting an orbital electron) and 
thus be enhanced. Since theions are fully stripped while in flight, decays 
via internal conversion will be completely suppressed. However, once 
theionis implanted it will recombine with electrons from the detector 
medium, opening up this decay path. The half-lives for such low-lying 
excited states—in particular E2 transitions separated by ~10 keV—will be 
~1-100 pss, considering the conversion coefficients for strontium” and 
the Weisskopf estimates of the y-decay half-lives°°. These estimates are 
also consistent with systematic trends in the region™. With a deadtime 
after implantation of ~5 ps for our measurements, the population of 
such states will mostly decay to the ground state of Sr before the 
implantation detector will become sensitive. In any case, if a separate 
species were present with a half-life greater than our deadtime then it 
would be observed in Extended Data Fig. 1. 


B-summing correction 

GEANT4 simulations of the detector configuration, coupled with LISE++ 
simulations of the implantation depth distribution, suggest that a 
B-summing correction of 110 + 15 keV needs to be applied to extract 
the proton energies”. This gives a value of Q, measured = 3-32 MeV, where 
Q, measured IS the total measured energy released in the decay, which is 
split between the proton and the remaining nucleus. However, we also 
need to include the effect of pulse-height defects in measuring the 
energy of the recoil nucleus”, using Q, = Q, measured t (I- K)Q,/M, where 
Kis the detection efficiency of the recoil (~30% for our case) and Mis 
the total mass of the decaying system (M= 73 AMU in our case). Apply- 
ing this correction gives the true value, Q, = 3.85 MeV. To obtain the 
value of the emitted protons in the laboratory frame, we also need to 
account for the recoil energy of the resulting Kr. Thus, the reported 
energy of the proton is £, = [(M - 1)/M]Q,. 


Fitting the decay-energy spectrum 

The background of the decay-energy spectrum—the shaded blue over- 
lay in Fig. 2b—was determined by analysing decay events inthe 5-s cor- 
relation window that occurred Is after implantation. After background 
subtraction, ax” minimization of the fit to the decay-energy spectrum 
was constrained by fitting the largest peak with a Landau distribution 
(generated by the #* particle) convoluted with a Gaussian distribution 
(generated by the proton) of the measured intrinsic detector resolu- 
tion (o = 45 keV). These shape parameters for the distribution were 
then fixed, and a second peak with the same shape parameters was 
added. The energy of the second peak was fixed to be 709 keV lower 
than original peak. The two peak heights, as well as the energy of the 
original peak, were then allowed to vary. 

From the spectrum shown in Fig. 2b, we do not see a notable number 
of events above background that are below 1 MeV. Owing to the thick- 
ness of our detector and the large value of Q ,. (the total energy released 
in the B* decay), we do not expect many, if any, B* particles to deposit 
morethan1 MeV into a single (or several) strip(s) of our detector espe- 
cially when considering the results of our simulation. As such, our data 
indicate that virtually all B-decay events of Sr are followed by the 
emission of a proton from ?Rb. 


Gamow coupled-channel analysis 
For this work, the rotational band of the core (Kr) with Is jx = 8*is 
included, of which the core rotational energies were taken from 


experiment™. The effective core-valence potential has been taken to 


bea deformed Woods-Saxon form including the spherical spin-orbit 
term with the ‘universal’ parameter set, which has been successfully 
applied to nuclei from theA = 80 region®*°. The Coulomb core-proton 
potential is calculated assuming that the core charge Z,,,.€ (e, unit of 
electron charge) is uniformly distributed inside the deformed nuclear 
surface. Since the decay width is very sensitive to the separation energy, 
in order to have a better description of the decay width, the Woods- 
Saxon depth V, is readjusted to fit the experimental decay energy 
Q, = 3.85 MeV. The predicted spectra of Sr and “Br using this decay 
energy and the ‘universal’ parameter set is shown in Extended Data 
Table 2. 

The calculations were carried out in the model space defined by 
max(@) < 20, where # is the orbital angular momentum between the 
proton and core. The Berggren basis was used for all channels, and 
the complex-momentum contour of the Berggren basis is defined by 
the pathk=0>0.4>0.2i> 0.6>2>4>->8 fm", with each segment 
discretized by 30 points (scattering states). 


Pauli blocking 
The supersymmetric transformation metho is a projection tech- 
nique that can prevent the valence proton from being emitted through 
already filled orbitals by adding a repulsive core near the origin. For 
simplicity, spherical orbitals that correspond to the deformed levels 
occupied inthe daughter nucleus are projected out. Hence, to estimate 
the uncertainty, another calculation was done with the removal of Pauli 
blocking, which causes the GCC calculations to reduce to solving the 
coupled-channel Schrédinger equation using nonadiabatic coupling”. 
To estimate the uncertainty of this projection technique, additional 
calculations were performed with the removal of Pauli blocking. Asa 
result, the branching to ”Kr*(2*) for the oblate J” = 5/2" solution was 
decreased to 15%, because the p,,. configuration was considerably 
reduced (down to 0.02%). However, the presence of a very small @=1 
componentstill allows fora large degree of branching. Therefore, both 
cases indicate that “Rb*(IAS) has spin and parity/"=5/2, and thus that 
Sr has aJj"=5/2 ground state, suggesting that the ground and first 
excited states of “Br are inverted relative to its mirror “Sr. 
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Data availability 


Raw data were generated at the National Superconducting Cyclotron 
Laboratory large-scale facility. All of the relevant data that support the 
findings of this study are available from the corresponding authors 
upon reasonable request. 
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are plotted with logarithmic bins. The resulting maximum logarithmic standard deviation from counting. 
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Extended Data Fig. 2| The mirror chart of nuclides. Mirror nuclei are plotted 
according to the isobaric spin (7) of their ground-state configurations. For 
almost the entire mirror chart, the spin and parity,/”, of the ground states are 
identically reflected across the N= Zline™. The black squares with cracks show 
the only two places onthe mirror chart where this ground-state mirror 
symmetry is known or believed to be broken. Once adjusting for the energy 


-3/2 -1/2 1/2 3/2 Tz 


-3/2 -1/2 


1/2 


shift of levels due to charge-breaking forces, the relative masses (AM) of mirror 
pairs (with the same magnitude 7,) become comparable, and the connection to 
IASs in neighbouring nuclei becomes clearer. This is illustrated by the isobar 
diagrams comparing the relative masses for two 7=3/2 multiplets, one inthe 
A=9system and the other inthe A = 73 system of interest. 
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Extended Data Table 1| GCC analysis 


Transitions Tp (keV)* Configurations! 
49.6% ot | or *th2-0) 
49.5% 2+ 35.0%( 5/2, 2") 
1.1% 4+ 6.2%(p1/2,2") 
6.3% fyy2, 4") 
78.8%(D1j2, 0") 
19.8%(f5/2, 2”) 
1.0%(p3/2, 2”) 


hia 39.8 
ODIate 
0.1% 4° ‘ 
0.4%(ho/2, 4") 


5/2 — g.s. band 
(oblate) 


99.6% O* 


1/2 —g.s. band 
/ 7 0.4% 2 


23.1%(f5/2, 0") 
40.7%(p1/2,2") 
20.2%(f5/2, 2”) 
10.8%(f5/2, 4") 
52.3%(p1/2, 0°) 
42.8%(f/2, 2”) 

2.6%(p3/2, 2") 

1.9%(ho/2, 4") 


The possibilities for the decay of “Rb*(IAS) via proton emission using two different deformations for the "Kr core (8, = -0.34 and B, = 0.4 for oblate and prolate, respectively) and spin 
assignments (J"=1/2° or 5/2>). 

g.s., ground state; I, decay width. 

*The decay width is inversely related to the half-life of the transition by the Heisenberg uncertainty principle. 

+The configurations adopt the spectroscopic notation for angular momentum. 


8.2% OF 
90.5% 2° 
1.2% 4t 


5/2 — g.s. band 
(prolate) 


98.5% O* 
0.8% 27 
0.6% 4* 


1/2. > g.s. band 
(prolate) 


Extended Data Table 2 | Predicted spectra of “Sr and “Br 


Nuclei oblate (62 = —0.34) prolate (G2 = 0.4) 
ijo- 5/27 E,(MeV)| 1/27 5/27 E,(MeV) 
Sr (Qn) |-14.945 -15.430 -0.485 |-15.219 -15.019 0.200 
Br (Qp)| -2.927 -3.402 -0.475 | -3.139 -2.946 0.193 


The core-nucleon interaction is the Woods-Saxon potential with the ‘universal’ parameter. The depth of the Woods-Saxon potential is fitted to the experimental decay 


energy Q, (Q,) for Sr (8Br). 
E, is the excitation energy of the J” = 5/2 state, that is, the energy difference between the two presented states. 
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® Check for updates 


Observing and controlling macroscopic quantum systems has long been a driving 
force in quantum physics research. In particular, strong coupling between individual 
quantum systems and mechanical oscillators is being actively studied’ *. Whereas 
both read-out of mechanical motion using coherent control of spin systems* ° and 


single-spin read-out using pristine oscillators have been demonstrated’*”, 
temperature control of the motion of a macroscopic object using long-lived 
electronic spins has not been reported. Here we observe a spin-dependent torque and 
spin-cooling of the motion of a trapped microdiamond. Using a combination of 
microwave and laser excitation enables the spins of nitrogen-vacancy centres to act 
onthe diamond orientation and to cool the diamond libration via a dynamical back- 
action. Furthermore, by driving the system in the nonlinear regime, we demonstrate 
bistability and self-sustained coherent oscillations stimulated by spin-mechanical 
coupling, which offers the prospect of spin-driven generation of non-classical states 
of motion. Sucha levitating diamond—held in position by electric field gradients 
under vacuum—can operate as a ‘compass’ with controlled dissipation and has 


potential use in high-precision torque sensing 


12-14 emulation of the spin-boson 


problem” and probing of quantum phase transitions”. In the single-spin limit” and 
using ultrapure nanoscale diamonds, it could allow quantum non-demolition read- 
out of the spin of nitrogen-vacancy centres at ambient conditions, deterministic 
entanglement between distant individual spins’® and matter-wave 


interferometry 


16,19,20 


Since the experiment by Einstein and de Haas in 1915”, much work has 
been carried out on the detection of atomic spins through mechanical 
motion”, culminating in the observation of a magnetic force from sin- 
gle spins’°"” and magnetometry at the nanoscale”. Conversely, single 
spins and qubits have also been used to sense the motion of objects. 
Single-qubit thermometry of mechanical oscillators was realized using 
a superconducting qubit coupled to membranes** and nitrogen- 
vacancy (NV) centres coupled to cantilevers>’. A crucial next step is 
to reach strong coupling between long-lived spins and mechanical 
oscillators, which will enable ground-state cooling, as in tethered quan- 
tum opto-mechanical platforms”, and the observation of quantum 
superpositions of macroscopic systems?. One further prospect is the 
entanglement between multiple spins"’, with far-reaching implica- 
tions for quantum information science and metrology”®. Obtaining 
coupling rates that surpass the decoherence of both the spin and the 
mechanical system is however still a challenge for most state-of-the- 
art platforms. Recently, there has been renewed focus on levitating 
objects”””’ motivated by the low mass and high Q-factors that they offer, 
together with the possibility of cooling their motion using embedded 
spins. There is a strong analogy between this platform—where spins 
move a levitating crystal—and laser-cooled atoms, where electrons 
move atomic nuclei. It may thus be forecast that a levitating particle 


containing a few long-lived spins will ultimately reach a level of control 
similar to that of trapped ions’, with bright prospects for the above- 
mentioned applications. 

In this work, we report a controllable torque induced by the spins 
of atoms embedded in a microscale object. Specifically, we couple 
the spin of many NV centres to the orientation of a trapped diamond 
particle. This coupling then enables us to show mechanical read-out of 
the spin resonance of the NV centres together with cooling and lasing 
of the diamond motion. 

The crystallographic structure of the NV centres is depicted in Fig. 1A. 
The spin-spin interaction between the two electrons inthe NV-centre 
ground state lifts the degeneracy of the spin-triplet eigenstates by 
D=2.87 GHz at room temperature. Such an interaction implies that 
the NV centre has a preferential quantization axis that is along one of 
the four crystal axes (111). In the presence of a magnetic field B at an 
angle @ with respect to the NV axis, the energy difference between the 
two energy eigenstates|m/ =+1) is about y.B@, where y, is the gyro- 
magnetic ratio of the electron. Spin control can then be performed 
using optical and microwave excitation, and the angular dependence 
of the NV spin energy eigenstates is expected to allow rotation and 
cooling of the diamond angular motion. Once in a magnetic state via 
a resonant microwave excitation, the NV centre will tend to align the 
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Fig. 1|Spin and mechanical systems. A, Sketch of the diamond 
crystallographic structure hosting NV defects and the effects of rotation. 

a, Equilibrium position of the diamond inthe Paul trap before the microwave 
excitation. The principal diamond axis P points towards the main trap axis. The 
ground state spin levels and microwave drive are depicted below. The magnetic 
field B lifts the degeneracy between the excited spin states by about y.Bq, 
where @is the angle between the magnetic field and the NV axis. The microwave 
signal prepares the NVin a magnetic state, which induces a torque onthe 
diamond. Inset: left, definition of ¢; right, representation of the NV 

(N, nitrogen; V, vacancy; C, carbon). b, Once at the new angular position, the 
spin projection onto the magnetic field is changed to y.B@’. The microwave 


corresponding diamond crystalline axis to the magnetic field, as illus- 
trated in Fig. 1A, a. Further, laser-triggered relaxation from the excited 
state can then extract the work exchanged between the spin magnetic 
energy and the librational motion (see Fig. 1A, b). 

Inour experiment, harmonic librational (sometimes called torsional, 
pendular or rotational) confinement is provided both by the Paul trap 
(Methods) and the particle asymmetry. We measure the libration of the 
diamond by using the reflection of the laser from the diamond surface. 
The micrometre-size roughness of our 15-m particles enables aspecu- 
lar pattern to be detected at the particle image plane, which after mode- 
matching one of the many bright spots to an optical fibre yields an 
angular sensitivity of about 0.3 mradHz ¥ and a resolution of about 
10 mrad Mcounts s* (see Methods and Extended Data Fig. 4). Under 
vacuum conditions (~1 mbar), the signal power spectrum plotted in Fig. 1B 
shows harmonic motion of the three librational modes with frequencies 
w,/2m ranging from 200 Hzto1.2 kHzand with a damping rate of about 
15 Hz. Figure 1B also shows an ODMR spectrum for a diamond outside 
the trap, in the presence of a magnetic field B ~ 30 G. Eight transitions, 
corresponding to the projections of the B field onto the four NV orienta- 
tions, are observed, with typical spin decoherence rates 1/T 5 = 7 MHz. 

We now measure the diamond rotation induced by the N= 10° NV 
electronic spins inside the diamond, withthe same optical read-out as 
for the librational mode detection, as depicted in Fig. 2A, a. The 
expected magnitude of the spin torque is/,=fANy.BS,~10- Nm. Here 
S,is the population in one of the magnetic states, determined by the 
competition between the microwave and laser polarization (both at 
rates inthe 100 kHzrange, see Methods and Extended Data Fig. 5). This 
torque gives a displacement of the particle angle in the trap, 
bp= F/lo% =10 mrad, where /~ 10 ” kg m’ is the particle moment of 
inertia. As can be seen in Fig. 2A, b, sweeping a microwave frequency 
around the spin resonances indeed enables conspicuous features to 
appear. Once in the magnetic state|m{ =- 1) or|mi =+ 1), the NVcen- 
tres tend to align or anti-align the diamond orientation to the magnetic 
field, which is manifest in the anti-correlation between the detected 
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frequency is then no longer resonant: the spin relaxes to the ground state and 
the diamond returns to its initial position. The red wavy arrow represents the 
polarization ata rate y,,, from the excited state to the ground state induced by 
the green laser. B, Measurements of the three librational modes undergoing 
Brownian motion at 1 mbar of vacuum pressure (top) and of the typical electron 
spin resonances fromthe NV ensemble within a microdiamond outside the trap 
using standard optically detected magnetic resonance (ODMR) at 30G 
(bottom; PL, photoluminescence) The filled circles label the four orientations 
ofthe NVspins. Green (blue) circles correspond to the m,= 0 tom, =~-1(m,=+1) 
transitions. Solid lines area fit to the data. 


intensity levels for all pairs of transitions. A standard ODMRalso meas- 
ured under the same magnetic field amplitude and measurement time 
(see Fig. 2A, c) demonstrates perfect correlation of the frequencies of 
the peaks in the two measurements. 

This spin-mechanical effect is in fact much richer than a static spin- 
dependent torque. As shown in Fig. 1A, the NV centres are magnetized 
through a microwave tone whose detuning from the NV resonances 
changes as the diamond rotates. To first order, such a torque will 
increase (decrease) the confinement of the Paul trap if the microwave 
frequency is blue (red) detuned from the spin resonance at the equi- 
librium angular position. Further, since the spin lifetime is of the order 
of the libration period a delay between the NV magnetization and the 
angular oscillation, observed in ref. ”’, can indeed induce a torque 
that depends on the velocity, in close analogy with opto-mechanical 
schemes”? and with Sisyphus cooling of cold atoms. The net result 
is apronounced cooling (heating) of the diamond motion when the 
microwave is red (blue) detuned from the spin resonance as sketched 
in Fig. 2B. In order to observe such spin-spring and spin-cooling effects, 
we monitor the librational power spectrum asa function of the micro- 
wave detuning from the electronic spin resonance. Figure 2B also shows 
the result of measurements taken for three different microwave fre- 
quencies. A strongly modified spring and damping of the mechanical 
mode are observed. Assuming that the initial temperature is 300 K 
(see Methods), the resulting temperature after spin-cooling is here 
80 K. Figure 2C shows measurements of the damping rate and spring 
effects as a function of microwave frequency in good agreement with 
a theoretical model (see Methods and Extended Data Fig. 1). Cooling 
is ultimately limited by heating from the microwave excitation of the 
motion on the blue side. This could be eliminated by increasing the 
trapping frequency w,/2m above the NV spin-transition linewidth. 

We now make astep into a regime where the spin-mechanical interac- 
tion induces nonlinear effects on the librational mode. Witha stronger 
spin torque (see Methods), Fig. 3A, a displays the expected bistable 
behaviour for the angular degree of freedom when the microwave 
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Fig. 2| Spin-dependent torque and cooling of alevitating diamond. 

A, Rotation of a diamond particle when NV centres are ina magnetic state. 

a, Sketch of the laser beam deflection induced by the NVspin torque. APD, 
avalanche photodiode. b, Detected APD count-rate /,,.,,,as a function of the 
microwave frequency. The green (blue) filled circles correspond tothe m,=Oto 
m,=~—1(m,= +1) transitions, while the green (blue) bars highlight the NV 
orientations that rotate the diamond as per sketcha.c, Corresponding ODMR. 
a.u., arbitrary units. B, Left, cooling/heating cycle of the librational motion 
induced by the spin-mechanical coupling. Right, power spectrum of the 


frequency is scanned across the spin resonance. The angle can be found 
at two metastable positions A or B depending on the history of the 
angular trajectory (see Supplementary Information). The hysteresis 
behaviour is indeed observed in the experiment, and shownin Fig. 3A, b. 
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detected light intensity reflected from the diamond surface when the 
microwave frequency is tuned to the blue (trace i), to the centre (traceii), and to 
the red (trace iii) of the spin resonance. The alignment of the reflected light 
from the diamond surface in the fibre was optimized to only let these two 
librational modes appear inthe power spectrum. Note that the particle is here 
different from the one used in Fig. 1B. PSD, power spectral density. C, Effective 
damping rates (left) and librational mode frequencies @,,,/2m (right) asa 
function of the microwave detuning. Lines show a fit to the experimental data 
using numerical simulations. 


The evolution of the particle orientation over time at a fixed microwave 
tone is also plotted in Fig. 3A, c. We note that the particle orientation 
jumps from site A to Bin a seemingly unpredictable manner owing 
to random kicks given to the particle. The average population at the 
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Fig. 3 | Nonlinear spin-motion dynamics. A, Bistability. a, Evolution of the 
particle angle asa function of the microwave detuning (see below for pointsA 
and B). b, Hysteresis behaviour of the particle orientation when the microwave 
signal is scanned from the red to the blue (blue curve) or from the blue to the 
red (red curve), as indicated by the arrows. c, Particle orientation asa function 
of time for a fixed microwave frequency tuned to the hysteritic frequencies 
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Microwave intensity (mW) 


Angle (a.u.) 


(2.625 GHz) , showing angular jumps between two stable sites A and B. 

B, Phonon lasing. a, Evolution towards lasing of the power spectrum of the 
librational motion asa function of the microwave power. b, Oscillator energy as 
a function of microwave intensity using microwave powers ranging from 
-44dBmto-16 dBminsteps of 4 dBm. The dashed line is a fit to the data. 

c, Histogram of the Brownian and lasing angular motions. 


angular position B can also be studied as a function of microwave detun- 
ing, and was shown to increase as the microwave frequency is tuned 
towards the blue side of the spin resonance (see Extended Data Fig. 6 
and Supplementary Information). 

We now set the microwave frequency to the blue side of the spin 
resonance in this strong spin-torque regime. Figure 3B, a shows the 
power spectral density as a function of the microwave pump power, 
where atransition from Brownian motion to a self-sustained oscillation 
is observed (see also Extended Data Fig. 3). Such a lasing-like action 
of a mechanical oscillator was observed in the first radiation pressure 
cooling experiments® with proposed applications in metrology. The 
spin-mechanical gain that enables such lasing action here is provided 
by blue-detuned microwave excitation, which amplifies the angular 
motion up to a point where losses are compensated by the magnetic 
gain (see Extended Data Fig. 2 for numerical results). The oscillator 
energy as a function of the microwave power is shown in Fig. 3B, b. A 
lasing threshold is observed at 6 mW of microwave excitation. Another 
signature of mechanical lasing is shown in Fig. 3B, c, which displays the 
probability distribution of the angular degree of freedom with and 
without microwaves. Under blue detuned microwave excitation, the 
probability distribution departs from the Gaussian process (red curve) 
for Brownian motion, and turns into the characteristic probability 
distribution of a coherent oscillation (blue curve). This effect shows 
that the librational mode can operate stably, deep in the nonlinear 
regime, and highlights further the analogy between the present spin- 
mechanical platform and opto-mechanical systems. 

Coupling individual spins to the motion of a macroscopic oscillator 
will have far-reaching applications in fundamental science, quantum 
information and metrology. The present spin-dependent torque itself 
may be used for detecting atomic defects with electron spins that can- 
not be efficiently detected through ODMR. Further, the approach may 
also be applied to other torsional nano-mechanical platforms”, which 
can exploit the long NV spin-lattice relaxation at low temperatures for 
longer interrogation times and efficient cooling. Last, operationinthe 
resolved sideband regime where 4/21 > 1/T 5 could be realized 
after modest improvements to the present set-up. We estimate that 
using a 1-tum-diameter pure diamond grown by chemical vapour dep- 
osition (CVD) attached to a 1-m-diameter ferromagnet would enable 
the resolved sideband regime to be entered for this hybrid structure. 
Librational frequencies w,/21 above 200 kHz have indeed been 
observed recently” and NV centres with 1/T3 = 50 kHz electron-spin 
decoherence rates can readily be obtained in CVD-grown microdia- 
monds enriched in”C. Entering this regime would offer the immediate 
prospects of ground-state cooling the diamond libration and multi- 
partite spin entanglement, and would provide strong impetus to bridge 
the gap between trapped particles and trapped atoms. 
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Methods 


Microdiamond properties 

The diamonds that we use are in the form of a powder, with particles that 
have a diameter of 15 pm. They are supplied by the company Adamas, 
which produces diamonds with a concentration of NV centres in the 
range 3-4 p.p.m., corresponding to (1.5-2) x 10° NV centres per micro- 
diamond. Using the same collection optics as was used in ref. *!, we 
estimated the number of optically addressed NV centres to lie in that 
same range. This is 4 to 5 orders of magnitude larger than the concen- 
tration used in the experiments reported in refs. ***, where no spin 
torque was observed. Under continuous laser excitation at around 
10 pW, our diamonds started to heat up at 0.1 mbar, whichis similar to 
pressures used in**: this points to the role played by impurities other 
than NV centres in the heating observed in ref. ”. 


The Paul trap 

We operate with a Paul trap that is similar to the one used in ref. except 
that the particles are stably trapped at the bottleneck region of the trap, 
where both the electric field gradient and the anisotropy are stronger, 
yielding higher librational confinement. The pressure we operate at 
is 1 bar for the spin-torque measurements shown in Fig. 2, and in the 
millibar range for the cooling and phonon lasing experiments. Below 
1 mbar, the diamond starts to rotate owing to a locking mechanism 
induced by the Paul trap drive, making it impossible to observe the 
spin-dependent torque, which relies on very stable libration. 


NV spin polarization and read-out 

Owing to anintersystem crossing inthe excited state of the NV centres, 
the electronic ground state |0) is brighter than the |+1) states under 
green laser illumination. This provides a means to read out the Zeeman 
splitting by scanning a microwave tone around the resonance, carrying 
out ODMR. Here, the microwave is applied directly to the trapping 
electrode, which provides an efficient means to excite the spins. The 
photoluminescence is detected using standard confocal microscopy. 
We use about 100 pW of laser light at 532 nm to polarize the NV centres. 
The laser is focused via a lens inside the vacuum chamber which has a 
numerical aperture of 0.5 and a working distance of 8 mm. The focal 
point of the laser is kept a few tens of micrometres away from the micro- 
diamond to mitigate the effect of radiation pressure and to enable laser 
excitation of the whole diamond”. To measure the polarization rates 
to the ground and to the excited magnetic states, we carry out the 
sequence depicted in Extended Data Fig. 5a. The photoluminescence 
rate is measured as a function of the time t for both sequences and is 
plotted in Extended Data Fig. 5b. The laser induced polarization rate 
to the m =0 state is 3.3 kHz. The microwave polarization rate 
I= °T%, (see Supplementary Information) to the magnetic state 
m, =—1is found to be 8 kHz when using —5 dBm of microwave power 
measured before a 25 dB amplifying stage. An estimate based on both 
the ODMR width and a Ramsey sequence yields 75 = 70 ns, implying 
Q/21 = 60 kHz. 

The degree of spin polarization cannot be estimated precisely with- 
out using a full numerical model and the 8 rate equations including 
mixing by the magnetic field transverse component. The magnetic field 
transverse component reduces the polarization time owing to mixing 
of electron spin states both in the ground and excited levels. This 
enhances the probability of non-radiative crossing to the metastable 
level and reduces the ODMR contrast by* 30%. Overall, this reduces the 
degree of optical polarization to the mj = Ospin state to about 60%. 


Detection and analysis of the libration 

The diamond motion is detected by collecting the back-reflected green 
light from the diamond surface, separated from the excitation light 
using a polarizing beamsplitter. The best sensitivity is achieved by tak- 
ing advantage of the speckle pattern produced by the rough surface of 


the microdiamond under coherent illumination. At the particle image 
plane, whichis located a few tens of centimetres away from the particle, 
an image is formed with an additional speckle feature. To detect the 
diamond motion, we focus a small area of this image onto a single-mode 
optical fibre and detect the photons transmitted through the fibre with 
a single-photon avalanche photodiode. The detected signal is then 
highly sensitive to the particle position and orientation. 


Angular displacement sensitivity 

Foragiven levitating particle, we can optimize, in real time, the change 
inthe optical signal coming from the angular displacement of the par- 
ticle by selecting the most favourable region of the particle image. 
To do this, we look at our optical signal while switching a microwave 
field tuned to one ODMR transition at a frequency of 1 Hz. Alignment 
is done by maximizing the change in the coupled light intensity as the 
diamond jumps between two angular positions. The linearity of the 
coupled light with the rotating angles can finally be assessed by looking 
at a higher order of the harmonic motion once the libration frequen- 
cies are identified. 

While being a sensitive measurement of the angular displacement, 
our technique does not give an absolute measurement of the angle 
change. The spin-torque vector F, is orthogonal to the plane defined 
by the magnetic field and the NV axis (it tends to align the NV axis to 
the B field). However, because the angular confinementis not isotropic, 
the particle rotation axis is not necessarily collinear with the spin 
torque. Determining the exact three-dimensional rotational dynamics 
of the particle would necessitate knowledge of the orientation of the 
NV axes with respect to the principal axes of the angular motion. 

Using NV magnetometry, the mechanically detected spin resonance 
cannonetheless be used to relate the optical signal change to the angu- 
lar displacement of the particle. A set of three mechanically detected 
spin resonances corresponding to three different microwave powers 
are shown in Extended Data Fig. 4a, under a magnetic field of 144 G. 
The minimum of each curve falls on the dashed line. The lower panel 
of Extended Data Fig. 4a is a theoretical curve where the angle between 
the NV axis and the magnetic field direction is plotted as a function 
of the frequency of the NV spin transition. This curve is obtained by 
diagonalizing the NV spin Hamiltonian in the presence of a magnetic 
field of 144 G. Since the maximum magnetization of the NV spins is 
obtained when the microwave field is resonant with the spin transition, 
one can relate the maximal change in the optical signal (AS) to the vari- 
ation of the angle between the NV axis and the magnetic field direction 
(A@yy). Doing so, we obtain here a resolution of 43 mrad Mcounts?!s7. 
Extended Data Fig. 4b shows a time trace of the optical signal upon 
Brownian motion of the particle. From the standard deviation of this 
signal and the above calibration, we obtain an angular displacement 
sensitivity of 0.3 mrad Hz ??. 

These numbers are however only upper bounds for our resolution 
and sensitivity. To explain why this is the case, Extended Data Fig. 4c 
shows a sketch of the angular motion of the diamond after magnet- 
izing one class of NV spins. For simplicity, we consider rotation about 
two axes here. Ina reference frame with axes given by the principal 
librational mode directions, we can parametrize the orientation of the 
NV axis in a subspace defined by the two angular coordinates 6,.and 
0,. The orientation without magnetization (M,= 0) is given by the trap 
and particle geometry and labelled O. The point B in this space is the 
direction of the magnetic field. Upon magnetization, a torque is applied 
to the particle such that the orientation follows the OB trajectory over 
time. However, owing to different confinement of the librational modes 
@, and w, along the xand y axes, the angular motion takes place along 
a different trajectory. 

In our experiments, the orientation of the magnetic field and NV axes 
relative to the principal axis of the libration is unknown. This prevents 
us from fully calibrating our detected angular motion. Nevertheless, 
provided that the detection is optimized to the librational mode having 


the highest confinement, we can ensure that the detected angular dis- 
placement 6, is smaller than the angular displacement 6, sensed by 
the NV spins. This can be seen in Extended Data Fig. 4c, where we note 
“NV”, the equilibrium position when , > @,. Our calibration method 
thus gives an upper bound to the obtained resolution and sensitivity. 
Optimization of the detection is performed by monitoring the power 
spectrum and tuning the speckle angle at the entrance of the fibre to 
maximize the power spectrum of the mode with the largest frequency. 
Extended Data Fig. 4d shows the power spectrum of the Brownian 
motion for two detection alignments. In the red trace, all three libra- 
tional modes, indicated with black arrows, are clearly visible. In the 
blue trace, the detection is tuned to be mainly sensitive to the mode 
with the highest confinement frequency. The latter detection tuning 
is used for the data shown in Extended Data Fig 4a and b. 

The sensitivity could be improved by collecting all the speckle pat- 
tern using acamera rather thanjusta fraction of it as we do now. Using 
a shorter laser wavelength would also straightforwardly improve the 
sensitivity. Another technical limitation comes from the trapped dia- 
mond motion in other modes than the libration mode of interest which 
adds noise to the angular displacement signal. In this regard, active 
stabilization of the centre of mass will greatly increase the sensitivity. 


Power spectral density 

Using the above described detection method, motional frequencies 
can be observed by sending the detected signal to aspectrum analyser. 
Under vacuum (1 mbar), the power spectrum exhibits narrow peaks 
at the trapping frequencies of the motional modes which are driven 
by Brownian motion (see Fig. 1B in the main text). For each librational 
mode, the power spectrum is fitted by the formula obtained in Sup- 
plementary Information: 


2ypkT 
I((wp- @)? + y*w7) 


Sg(@) 


(1) 


The librational modes can in fact be unambiguously identified (and 
separated from the centre of mass modes) using the torque induced 
by the NV centres. By switching on and off a microwave field tuned to 
onespinresonance at the same period as that of one diamond libration, 
one performs parametric excitation of that librational mode. In our 
experiments, a sequence of five microwave pulses is enough to displace 
the angle above the Brownian thermal noise. Following a parametric 
excitation sequence, the diamond orientation ‘ring-down’, or decay, 
is observed. A typical decay curve is shown in Extended Data Fig. 1a. 
We typically find librational frequencies in the range 100 Hz to1 kHz. 


Parameters used for the spin-dependent torque measurements 

The ODMR and spin-mechanical measurement scans shown in Fig. 2A 
of the main text are taken under atmospheric pressure. The green laser 
power was 330 pW and the microwave power was set to 0 dBm. The mag- 
netic field is around 95 G. For the mechanically detected spin resonance 
(Fig. 2A, b), the microwave detuning is scanned in 2 MHz steps witha 
duration of 10 ms per point. During those 10 ms, the diamond orientation 
has enough time to reach its equilibrium position and the spin-torque 
effect can be observed. The average count rate is 2.3 x 10°s“ for a total 
averaging time of 10 min. For the ODMR trace (Fig. 2A, c), the microwave 
detuning is scanned in steps of 2 MHz witha duration of 1 ms per point. 
For each point, the microwave field is switched off for the first 0.5 ms and 
switched on for the last 0.5 ms during which the signal is acquired. This 
prevents mechanical effects from altering the detected photolumines- 
cence signal from the NV centres. The photoluminescence count rate 
is 5 x10°s ‘and the total averaging time for this measurement was 3 h. 


Estimation of the temperature 
The temperature associated with the librational modes can only be 
estimated. Obtaining a precise temperature value would require 


knowledge about the moment of inertia of the particle, whichis prone 
to strong systematic errors. The standard method is to vary the pres- 
sure® while observing the power spectrum: over the pressure range 
where its area is constant (to satisfy Liouville’s theorem under adi- 
abatic transformation), the librational mode temperature is known 
to be 300 Kas it is thermalized with the gas temperature. In our case, 
pressure variations slightly change the orientation and position of the 
trapped particle and, incidentally, the sensitivity to angular motion. 
This prevents such a method from being used. However two obser- 
vations support the fact that the external degrees of freedom of the 
particle are thermalized at 300 K when operating in the millibar pres- 
sure range. We measured the particle internal temperature with our 
typical laser powers via NV thermometry”, and found it to be close 
to 300K. This ensures that no heating of the libration modes comes 
from the heating of the gas surrounding the particle*®. We observed 
that heating of the particle by the laser starts below 0.1 mbar, similar 
to what was measured using diamonds that were doped with a three 
orders of magnitude smaller NV concentration. Several sources of 
noise could also heat up the particle, suchas the laser-induced torque” 
or charge fluctuations. Heating by the former can be excluded as no 
noticeable changes of the power spectrum shape occur when laser 
power is increased up to 1 mW. 


Parameters and calibration 

The power spectrum of the detected librational modes depends 
strongly on the particle angle. For the same motional amplitude, a 
change in the particle angle potentially implies a different speckle 
pattern, which in turn changes the power spectrum sensitivity. Since 
the particle angle changes with the microwave detuning and power, 
different power spectra cannot be directly compared when the param- 
eters are changed. Traces i, ii and iii in the right panel of Fig. 2B in the 
main text have thus been obtained at the same particle angle to enable 
quantitative comparison between their areas. Operating at the same 
particle angle was ensured by performing a resonant spin-mechani- 
cal detection at different microwave amplitudes and choosing pairs 
of microwave frequencies and powers that correspond to the same 
count rates. As shown in the data in Extended Data Fig. 1b, we chose 
microwave detunings corresponding to the points 1, 2, 3 for the two 
traces i and ii taken at microwave powers of —-20 dBm and -10 dBm, 
respectively. The frequencies, which are 2.617 GHz, 2.623 GHz and 
2.634 GHz, respectively on the blue, resonant and red side of the spin 
resonanceall correspond tothe same angle under these power condi- 
tions. A fit to the experimental curves in Fig. 2B was obtained using the 
formula 


QYogpkT 
1((@2e¢ eT. 7)? + V2 07) 


Sp(@) = (2) 


The dependence of the damping y,,,and frequency shift @,,,on the 
microwave detuning shown in Fig. 2C was obtained using parametric 
excitation of the librational mode at 480 Hz. The microwave power is 
-10 dBm and for this measurement, the above-mentioned calibration 
issue (change in the sensitivity when the microwave detuning is varied) 
isnot relevant. In order to extract the damping and shifts, the resulting 
ring-down was fitted by the formula: 


S(t) = Aj SiN(Werpl + P)EXP- Varel/2) (3) 

+A, sin(@ t+ b,)exp(- y,t/2) + Ao 
where the second exponentially damped sine term takes into account 
the slightly excited librational mode at 590 Hz. Three of these ring-down 
traces are shown in Extended Data Fig. 1c. For each curve, the averaging 
time is around 100s. An estimation of the temperature relies on com- 
paring the damping with and without spin-cooling, using the relation” 
Tapp = oe valid for small spin-spring frequency shifts. 
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Modelling of the experiment 

For most of this paper, we modelled the experiment numerically using 
Monte Carlo simulations that include the full three-level structure of the 
NVspin-1system in the ground state (see Supplementary Information). 
For the spin-cooling and spin-spring effects shown in Fig. 2C, the number 
of NV centres, polarization rate, Rabi frequencies, and angle between the 
NV centres and the main axis of the diamond are left as free parameters. 


Bistability and phonon-lasing 

The curve in Fig. 3A, a, shows the angular evolution of the particle asa 
function of the microwave detuning obtained using similar parameters 
to those in the linear regime, but using microwave and laser powers 
that best fitted the data in Fig. 3A, b, and a lower trapping frequency 
(@,/211 = 240 Hz). Figure 3A, c, shows the evolution of the librational 
mode angle as a function of time in this regime. Several such curves 
were obtained for different microwave detunings, and are shown in 
Extended Data Fig. 6. A Monte Carlo simulation was also performed 
using our experimental parameters, with a microwave tuned to the 
red side, and shows similar jumps between the two stable points A 
and B. The data shown in Fig. 3B, a, of the main text show the evolu- 
tion of power spectra for different microwave signal powers, when the 
microwave frequency is tuned to the blue of the ODMR transition. The 
onset of instability is seen at approximately 0 dBm. For a quantitative 
estimate of the threshold, we compute the area below the librational 
peak as a function of microwave detuning. This is shown in Fig. 3B, b. 
Note that here the sensitivity of the power spectra to the angle may 
induce some systematic errors. For these measurements, we fitted 
the data by the numerical model, and found good agreement with the 
numerical analysis, but a quantitative comparison with the experiment 
is difficult owing to the above mentioned angle-dependent sensitivity. 


Data availability 


The data that support the findings of this study are available from the 
corresponding author upon reasonable request. 
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low-frequency librational mode (170 Hz) at 1 mbar (bottom). b, Spin- two sines at the frequency of the parametrically excited librational mode 
mechanical resonance for two different microwave powers: trace i, 20 dBm; (480 Hz) and of the closest librational mode (590 Hz) with an amplitude that is 


trace ii, -10 dBm. c, Reflected light field amplitude as a function of time, for the 30 times smaller. 
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200 and 250 kHz, respectively. b, Evolution of the angle as a function of time tracesi, iiandiii, respectively. 
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Extended Data Fig. 3| Phonon-lasing as a function of time. Shown is the 
reflected light field amplitude as a function of time on sudden switch-on ofa 
microwave signal at atime t=0.02s tothe blue side of the spin transition. 
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The microwave power is above threshold so lasing can be observed after the 
Brownian motion signal (seen before t=0.02 s). This curve was used to plot the 
histogram of Fig. 3C in the main text. 


Article 


Signal (Mcounts/s) & 


Onv (rad) 


Aas 
/s) 
aN 


Signal 
counts 


(M 


je) 
< 


2.80 
Frequency (GHz) 


2.79 


Extended Data Fig. 4 | Calibration of sensitivity using NV magnetometry. 

a, Upper panel, mechanically detected spin resonance for three different 
microwave powers. The dashed line is the locus of the signal minima. Lower 
panel, angle between the NV axis and magnetic field direction versus NV spin 
transition frequency. AS is the maximal change of the optical signal. AOy, is the 
maximum angle between the NV axis and the magnetic field direction. 

b, Optical signal as a function of time. c, Left panel, sketch depicting the 
angular motion of the diamond on magnetization of one class of NV spins using 
only two angles 6, and 6, for simplicity. Bis the orientation of the magnetic 
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field. Right panel, angular trajectory represented in (6,, 0,) space. We note Ois 
the particle orientation without NV magnetization (M,=0). The red (green) line 
is the trajectory in the isotropic (anisotropic) case (see Methods). 0, is the 
detected angle. d, Power spectra of the librational Brownian motion for two 
different speckle alignments taken witha resolution bandwidth of 1 Hz. The red 
curve shows all three librational modes. For the blue curve, the detection is 
tuned to be mainly sensitive to the mode with the highest confinement 
frequency. The latter detection setting is used for the data showninaandb. 
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Extended Data Fig. 5| Measurement of spin polarization rates. a, Sequences 
employed to measure the laser induced polarization rate to the ground state (i) 
and to measure the microwave induced polarization rate to the magnetic state 
m, =—1(ii). The laser is kept on at all times for both sequences. b, Tracei 

shows the photoluminescence (PL) rate at atime rafter having turned off the 


Normalised PL 


0 0.2 


0.4 0.6 
Time Tt (us) 


0.8 1 


microwave signal. An exponential fit to the data gives a laser polarization time 
of 300 ps. Trace ii shows the photoluminescence rate at atime rafter having 
turned on the microwave signal. The polarization time to the magnetic state is 
here124 ps. 
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Extended Data Fig. 6 | Particle angle dynamics for different microwave 


detunings inthe bistable regime. a-f, Left panel, experimental observation of 
the reflected signal amplitude as a function of time for microwave frequencies 
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number of counts within each bin. 
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The ability to communicate quantum information over long distances is of central 
importance in quantum science and engineering’. Although some applications of 
quantum communication such as secure quantum key distribution”’ are already 


being successfully deployed*”, their range is currently limited by photon losses and 
cannot be extended using straightforward measure-and-repeat strategies without 
compromising unconditional security®. Alternatively, quantum repeaters’, which 
utilize intermediate quantum memory nodes and error correction techniques, can 
extend the range of quantum channels. However, their implementation remains an 


outstanding challenge 


1016 ‘requiring a combination of efficient and high-fidelity 


quantum memories, gate operations, and measurements. Here we use a single solid- 
state spin memory integrated in ananophotonic diamond resonator” ’ toimplement 
asynchronous photonic Bell-state measurements, which are a key component of 
quantum repeaters. Ina proof-of-principle experiment, we demonstrate high-fidelity 
operation that effectively enables quantum communication at a rate that surpasses 
the ideal loss-equivalent direct-transmission method while operating at megahertz 
clock speeds. These results represent a crucial step towards practical quantum 


repeaters and large-scale quantum networks 


20,21 


Efficient, long-lived quantum memory nodes are expected to play an 
essential part in extending the range of quantum communication’, 
as they enable asynchronous quantum logic operations, such as Bell- 
state measurements (BSMs), between optical photons. Such an asyn- 
chronous BSMis central to many quantum communication protocols, 
including the realization of scalable quantum repeaters’ with multiple 
intermediate nodes. Its elementary operation can be understood by 
considering a specific implementation of quantum cryptography”? 
illustrated in Fig. 1a. Here two remote communicating parties, Alice and 
Bob, try to agree ona key that is secure against potential eavesdrop- 
pers. They each send a randomly chosen photonic qubit {|£x),|+y)} 
encoded in one of two conjugate bases (X or Y) across a lossy channel to 
an untrusted central node (Charlie), who performs a BSM and reports 
the result over an authenticated public channel. After a number of 
iterations, Alice and Bob publicly reveal their choice of bases to obtain 
a correlated bit string (a sifted key) from the cases when they used a 
compatible basis. A potentially secure key can subsequently be distilled 
provided the BSM error rate is low enough. 

Although a photonic BSM can be implemented with linear optics 
and single-photon detectors, the BSM is only successful in this ‘direct- 
transmission’ approach when photons from Alice and Bob arrive simul- 
taneously. Thus, when Alice and Bob are separated by a lossy fibre with 
a total transmission probability p, ,, <1, Charlie measures photon 
coincidences with probability also limited by p,,, leading to a funda- 
mental bound® on the maximum possible distilled key rate of 
Rinax = Pasp/2 bits per channel use for an unbiased basis choice*. Although 


linear optical techniques to circumvent this bound are now being 
actively explored”, they offer only limited improvement and cannot 
be scaled beyond a single intermediate node. 

Alternatively, this bound can be surpassed using a quantum memory 
node at Charlie’s location. In this approach, illustrated in Fig. 1b, the 
state of Alice’s photon is stored inthe heralded memory while awaiting 
receipt of Bob’s photon over the lossy channel. Once the second pho- 
ton arrives, a BSM between Alice’s and Bob’s qubits yields a distilled 
key rate that for an ideal memory scales as” R, « {[Pasp’ potentially 
leading to substantial improvement over direct transmission. 


Efficient nanophotonic quantum node 

Inthis work we realize and use a quantum node that enables BSM rates 
exceeding those of an ideal system based on linear optics. We focus on 
the demonstration and characterization of the BSM node, leaving the 
implementation of source-specific technical components of full-scale 
quantum key distribution systems, such as decoy states”°, basis bias- 
ing”’, a finite key error analysis” and a physical separation of Alice and 
Bob for future work. Our realization is based ona single silicon-vacancy 
(SiV) colour centre integrated inside a diamond nanophotonic cavity” ? 
(Fig. 2a). Its key figure-of-merit, the cooperativity” C, describes the 
ratio of the interaction rate with individual cavity photons compared 
toall dissipation rates. Alow mode volume (0.5(A/n)’, with wavelength 
Aand refractive index n), high quality factor (2 x 10*), and nanoscale 
positioning of SiV centres enable an exceptional C= 105 +11. Cavity 
photons at 737 nm wavelength are critically coupled to a waveguide 
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Fig. 1| Concept of memory-enhanced quantum communication. 

a, Quantum communication protocol. Alice and Bob (A and B, respectively) 
send qubits encoded in photons to a measurement device (Charlie; C) in 
between them. Charlie performs a BSM and announces the result. After 
verifying in which rounds Alice and Bob sent qubits in compatible bases, a 
sifted key is generated. b, Illustration of memory-enhanced protocol. Photons 
arrive at Charlie from A and B at random times over alossy channel, and are 


and adiabatically transferred into a single-mode optical fibre’’ that is 
routed to superconducting nanowire single-photon detectors, yielding 
a full system detection efficiency of about 85% (Methods). The device 
is placed inside a dilution refrigerator, resulting in an electronic spin 
quantum memory” time T, > 0.2 ms at temperatures below 300 mK. 

The operating principle of the SiV-cavity-based spin-photon inter- 
face is illustrated in Fig. 2. Spin-dependent modulation of the cavity 
reflection at incident probe frequency fo (Fig. 2b) results in the direct 
observation of electron spin quantum jumps (Fig. 2c, inset), enabling 
non-destructive single-shot readout of the spin state (Fig. 2c) in30 ps 
with fidelity F= 0.999870-099?, Coherent control of the SiV spin qubit 
(fg = 12 GHz) is accomplished using microwave fields delivered via an 
on-chip gold coplanar waveguide”. We utilize both optical readout 
and microwave control to perform projective feedback-based initiali- 
zation of the SiV spin into the |V) state witha fidelity of F=0.998 + 0.001. 
Spin-dependent cavity reflection also enables quantum logic opera- 
tions between anincoming photonic time-bin qubit, defined by a phase- 
coherent pair of attenuated laser pulses, and the spin memory”””’. We 
characterize this by using the protocol illustrated in Fig. 2d to generate 
the spin-photon entangled state (|e?) + |/V))/./2 conditioned on suc- 
cessful reflection of an incoming single photon with overall heralding 
efficiency n = 0.423 + 0.004 (Methods). Here, |e) and |/) denote respec- 
tively the presence of a photon in an early or alate time-bin, separated 
by 6t=142 ns. We characterize the entangled state by performing meas- 
urements in the joint spin-photon ZZand XX bases (Fig. 2e), implement- 
ing local operations onthe reflected photonic qubit with atime-delay 
interferometer (TDI; Fig. 2a, dashed box). By lowering the average 
number of photons (n),, incident onthe device during the SiV memory 
time, we reduce the possibility that an additional photon reaches the 
cavity without being subsequently detected, enabling high spin-pho- 
ton gate fidelities for small (n),, (Fig. 2f). For (n),, = 0.002 we measure 
a lower bound on the fidelity” of the spin-photon entangled state of 
F>0.944 + 0.008, primarily limited by residual reflections from the 
|) state. 


Asynchronous BSMs 

This spin-photon logic gate can be directly used to herald the storage 
of an incoming photonic qubit by interferometrically measuring the 
reflected photon inthe X basis”. To implement a memory-assisted BSM, 
we extend this protocol to accommodate a total of N photonic qubit 
time-bins within a single initialization of the memory (Fig. 3a). Each 


From Bob | | | | | \ | | | \ Time 


unlikely to arrive simultaneously (rare success indicated in purple), leading toa 
low BSM success rate for direct transmission. Despite overhead time 7, 
associated with operating a quantum memory (red), a BSM can be performed 
between photons that arrive at Charlie within memory coherence time 75, 
leading to higher success rates (green). BSM successes and failures are denoted 
by dark and light shaded windows respectively for both approaches. 


individual time-bin qubit is encoded in the relative amplitudes and 
phases of a pair of neighbouring pulses separated by 5¢. Detection of 
areflected photon heralds the arrival of the photonic qubit formed by 
the two interfering pulses without revealing its state”. Two such herald- 
ing events, combined with subsequent spin-state readout in the X basis, 
constitute a successful BSM on the incident photons. This can be under- 
stood without loss of generality by restricting input photonic states 
to be encoded in the relative phase @ between neighbouring pulses 
with equal amplitude: (|e) + eI) /J2 (Fig. 3b). Detection of the first 
reflected photon in the X basis teleports its quantum state onto the 
spin, resulting inthe state(|*) + m,e'*1|))/./2, where m,=+1 depend- 
ing on which detector registers the photon”. Detection of a second 
photon ata later time within the electron spin 7, results in the spin state 
(4) + myme!?1*2)|1 ))/./2. The phase of this spin state depends only 
onthe sum of the incoming phases and the product of their detection 
outcomes, but not the individual phases themselves. As a result, ifthe 
photons were sent with phases that meet the condition ¢, + @, € {0, Tt}, 
a final measurement of the spin in the X basis (m, = +1) completes an 
asynchronous BSM, distinguishing two of the four Bell states based on 
the total parity m,m,m, = +1 (Supplementary Information, Extended 
Data Table 3). 

This approach can be directly applied to generate a correlated bit- 
string within the protocol illustrated in Fig. la. We analyse the system 
performance by characterizing the overall quantum-bit error rate 
(QBER)*” for N= 124 photonic qubits per memory initialization. We 
use several random bit strings of incoming photons from {|+x),|+y)} and 
observe strong correlations between the resulting BSM outcome and 
the initial combination of input qubits for both bases (Fig. 3c). Using 
this method, we estimate the average QBER to be F= 0.116 + 0.002 
for all combinations of random bit strings measured, significantly 
(P<10°) below the limit of £,,= 0.146, which could provide security 
against individual attacks* (note that the measured error rate is also 
well below the minimum average QBER” of £,, = 0.125 achievable using a 
linear optics BSM with weak coherent pulse inputs, see Supplementary 
Information). In our experiment, the QBER is affected by technical 
imperfections inthe preparation of random strings of photonic qubits. 
We find specific periodic patterns of photonic qubits to be less prone 
to these effects, resulting ina QBER as low as E=0.097 + 0.006, which 
falls within the threshold corresponding to unconditional security’ of 
E,=0.110 with a confidence level of 0.986 (Supplementary Informa- 
tion). We further verify security by testing the Bell-CHSH inequality“ 
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Fig. 2| Realization of heralded spin-photon gate. a, Schematic of memory- 
assisted implementation of Charlie’s measurement device, consisting ofa 
diamond nanophotonic resonator (grey) containing SiV quantum memory 
(blue circle) with an integrated microwave stripline (yellow). Weak pulses 
derived from a single laser simulate incoming photons from Alice and Bob 
(purple). Reflected photons (red) are detected ina heralding set-up (dashed 
box). b, Reflection spectrum of the memory node, showing spin-dependent 
device reflectivity. c, Histogram of detected photon numbers during a 30-ps 
laser pulse, enabling single-shot readout based onathreshold of 7 photons. 


using input states from four different bases, each separated by an 
angle of 45° (Supplementary Information). We find that the correla- 
tions between input photons (Fig. 3d) violate the Bell-CHSH inequality 
S,<2, observing S, =2.21+ 0.04 and S_=2.19 + 0.04 for positive and 
negative BSM parity results, respectively. This result demonstrates that 
this device can be used for quantum communication that is secured 
by Bell’s theorem. 


Benchmarking quantum memory advantage 

To benchmark the performance of memory-assisted quantum com- 
munication, we model an effective channel loss by reducing the mean 
photon number (n), incident on the device per photonic qubit. Assum- 
ing that Alice and Bob emit roughly one photon per qubit, this yields 
an effective channel transmission probability p, ,, = (ny, resulting in 
the maximal distilled key rate R,,,,, per channel use for the direct-trans- 
mission approach, given by the red line in Fig. 4. We emphasize that 
this is atheoretical upper bound for alinear-optics-based BSM, assum- 
ing ideal single-photon sources and detectors and balanced basis 
choices. The measured sifted key rates of the memory-based device 
are plotted as open circles in Fig. 4. Owing to the high overall heralding 
efficiency and the large number of photonic qubits per memory time 
(up toN=504), the memory-assisted sifted key rate exceeds the capa- 
bility of alinear-optics-based BSM device by a factor of 78.4 +0.7 atan 
effective channel loss of about 88 dB. 
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Inset, electron spin quantum jumps under weak illumination. d, Schematic of 
spin-photon quantum logic operation used to generate and verify spin— 
photon entangled state. e, Characterization of resulting spin-photon 
correlations in the ZZ and XX bases. Dashed bars show ideal values. f, Measured 
spin-photon entanglement fidelity as a function of (n),,, the average incident 
photon number during each initialization of the memory. Error bars, 68% 
confidence interval (c.i.). See main text and Methods for details of 
nomenclature used inthis figure. 


In practice, errors introduced by the quantum memory node could 
leak information to the environment, reducing the quality and poten- 
tial security of the sifted key*. A shorter secure key can be recovered 
froma sifted key with finite QBER using classical error correction and 
privacy amplification techniques. The fraction of distilled bits r, that 
can be secure against individual attacks rapidly diminishes‘ as the 
QBER approaches £,, = 0.147. For each value of the effective channel 
loss, we estimate the QBER and use it to compute r,, enabling extraction 
of distilled key rates R,, plotted in black in Fig. 4. Even after error cor- 
rection, we find that the memory-assisted distilled key rate outperforms 
the ideal limit for the corresponding direct-transmissionimplementa- 
tion bya factor of up to R</Rmax = 4-1 + 0.5 (+0.1 systematic uncertainty, 
for N=124). We further find that this rate also exceeds the fundamen- 
tal bound on repeaterless communication’ R,<1.44p,,, witha statisti- 
cal confidence level of 99.2% (with *93 systematic uncertainty, see 
Methods). Despite experimental overhead time associated with oper- 
ating the device (7, in Fig. 1b), the performance of the memory-assisted 
BSM node (for N= 248) is competitive with an ideal unassisted system 
running at a 4 MHz average clock rate (Methods). 


Outlook 

These experiments demonstrate a form of quantum advantage allowed 
by memory-based communication nodes and represent a crucial step 
towards realizing functional quantum repeaters. Several important 
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Fig. 3 | Asynchronous BSMs using quantum memory. a, Example sequence 
with N= 6 photonic qubits sent ina single memory time. Microwave Tt pulses 
(green) are interleaved with incoming optical pulses. Photons have fixed 
amplitude (red) and qubits are defined by the relative phases between 
subsequent pulses (blue). b, Bloch sphere representation of input photonic 
time-bin qubits used for characterization. c, Characterization of asynchronous 


technical improvements will be necessary to apply this advance to 
practical long-distance quantum communication. First, this protocol 
must be implemented using truly independent, distant communicat- 
ing parties. Second, frequency conversion from telecommunications 
wavelengths to 737 nm, as well as low-loss optical elements used for 
routeing photons to and fromthe memory node, will need to be incor- 
porated. Last, rapid generation of provably secure keys will require 
implementation of decoy-state protocols”®, biased bases” and finite-key 
error analyses”, all compatible with the present approach. With these 
improvements, our approach is well-suited for deployment in real- 
world settings. It does not require phase stabilization of long-distance 
links and operates efficiently in the relevant regime of p,,,~ 70 dB, cor- 
responding to about 350 km of telecommunications fibre. Additionally, 
a single device can be used at the centre of a star network topology”, 
enabling quantum communication between several parties beyond 
the metropolitan scale. 

Furthermore, the present approach could be extended along several 
directions. The use of long-lived °C nuclear spin qubits could eliminate 
the need to operate at low total (n),, and would provide longer storage 
times, potentially enabling 100-fold enhancement of BSM success 
rates’, Recently implemented strain-tuning capabilities” should allow 
for operation of many quantum nodes at acommon network frequency. 
Also, unlike linear-optics-based alternatives”, the approach presented 
here could be extended to implement the full repeater protocol, ena- 
bling a polynomial scaling of the communication rate with distance’. 
Last, the demonstrated multi-photon gate operations could also be 
adapted to engineer large cluster-states of entangled photons”, which 
can be used for rapid quantum communication®. Implementation of 
these techniques could enable the realization and application of scal- 
able quantum networks’ beyond quantum key distribution, ranging 
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BSM. Shownare conditional probabilities for Alice and Bob to have sent input 
states (i,/) givena particular parity outcome for input states in the X (top) and Y 
(bottom) bases. d, Bell test using the CHSH inequality. Conditioned on the BSM 
outcome, the average correlation between input photons is plotted for each 
pair of bases used (Supplementary Information). Shaded backgrounds denote 
the expected parity. Error bars, 68% c.i. See main text for details. 


fromnon-local quantum metrology” to modular quantum computing 
architectures”. 
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Fig. 4 | Performance of memory-assisted quantum communication. Shown is 
alog-log plot of key rate in bits per channel use versus effective channel 
transmission (p, ,, = (ny where (n), is the average number of photons incident 
onthe measurement device per photonic qubit). Red line, theoretical 
maximum for loss-equivalent direct-transmission experiment. Green open 
circles, experimentally measured sifted key rate (green line is the expected 
rate). To ensure optimal operation of the memory, (n),=(N)pN = 0.02 is kept 
constant (Methods). From left to right, points correspond to N= {60, 124, 248, 
504}. Black filled circles, distilled key rates R, using memory device. Vertical 
error bars, 68% c.i.; horizontal error bars, s.d. of the systematic power 
fluctuations. 
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Methods 


Experimental set-up 

We perform all measurements in a dilution refrigerator (BlueFors 
BF-LD250) with a base temperature of 20 mK. The dilution refrigera- 
tor is equipped with a superconducting vector magnet (American 
Magnets Inc. 6-1-1 T), ahome-built free-space wide-field microscope 
with a cryogenic objective (Attocube LT-APO-VISIR), piezo position- 
ers (Attocube ANPx101 and ANPx311 series), and fibre and microwave 
feedthroughs. Tuning of the nanocavity resonance is performed using 
agas condensation technique”. The SiV-cavity system is optically inter- 
rogated through the fibre network without any free-space optics”. The 
operating temperature of the memory node during the BSM measure- 
ments was 100-300 mK. We note that similar performance at higher 
temperatures should be feasible in future experiments by using recent 
developments with heavier group-lV colour centres™ or highly strained 
SiV centres*. Additional details about the experimental set-up and 
device fabrication'*"°*’ for millikelvin nanophotonic cavity quantum 
electrodynamic experiments with SiV centres are thoroughly described 
elsewhere*’. 


Nanophotonic quantum memory 

Aspectrum of the SiV-cavity system at large detuning (248 GHz) allows 
us to measure the cavity linewidth x = 21.6 + 1.3 GHz (Extended Data 
Fig. 2a, blue curve) and natural SiV linewidth y = 0.123 + 0.010 GHz 
(Extended Data Fig. 2a, red curve). We find spectral diffusion of the 
SiV optical frequency to be much smaller than y on minute timescales 
with an excitation photon flux of less than 1 MHz. Next, we estimate the 
single-photon Rabi frequency, g, using the cavity reflection spectrum 
for zero atom-cavity detuning, shown in red in Extended Data Fig. 2a. 
For aresonant atom-cavity system probed in reflection froma single 
port with cavity-waveguide coupling k,,, the cavity reflection coef- 
ficient’ as a function of probe detuning A, is given by 


: g? 
iA, + 77 ~ Kwe t 3 


iAg 7 
r(A,) = — Ae (1) 
iA, a iA Vv 9: 


By fitting |r(4,)|? using known values of x and y, we obtain the solid 
red curve in Extended Data Fig. 2a, which corresponds to a single- 
photon Rabi frequency g= 8.38 + 0.05 GHz, yielding the estimated 
cooperativity C= Ae =105+11. 
Microwave control 
We use resonant microwave pulses delivered via an on-chip coplanar 
waveguide to coherently control the quantum memory’””®. We measure 
the spectrum of the spin-qubit transition by applying a weak, 10-p1s-long 
microwave pulse of variable frequency, observing the optically 
detected magnetic resonance spectrum presented in Extended Data 
Fig. 3a. We note that the spin-qubit transition is split by the presence 
of anearby °C. While coherent control techniques can be employed 
to use the °C as an additional qubit’’*®, we do not control or initialize 
it in this experiment. Instead, we drive the electron spin with strong 
microwave pulses at a frequency fg such that both °C-state-specific 
transitions are addressed equally. This also mitigates slow spectral 
diffusion of the microwave transition’® of -100 kHz. 

After fixing the microwave frequency at/g, we vary the length of this 
drive pulse (t, in Extended Data Fig. 3b) and observe full-contrast Rabi 
oscillations. We choose a 1 time of 32 ns in the experiments in the main 
text, which is a compromise between two factors: (1) it is sufficiently 
fast such that we can temporally multiplex between 2 and 4 time-bin 
qubits around each microwave tt pulse and (2) it is sufficiently weak 
to minimize heating-related effects from high microwave currents in 
resistive gold coplanar waveguide. 


With known Tt time, we measure the coherence time of the SiV spin 
qubit under an XY8-1 dynamical decoupling sequence to exceed 200 ps 
(Extended Data Fig. 3c). In the main experiment we use decoupling 
sequences with more Tt pulses. As an example, Extended Data Fig. 3d 
shows the population in the |*) state after the XY8-8 decoupling 
sequence (total N,, = 64 1 pulses) as a function of T, half of the inter- 
pulse spacing. For BSM experiments, this inter-pulse spacing, 27, is 
fixed and is matched to the time-bin interval 5t. While at some times 
(for example, T= 64.5 ns) there is a loss of coherence due to entangle- 
ment with the nearby “C, at 27= 142 ns we are decoupled from this 
BC and can maintain a high degree of spin coherence. Thus we chose 
the time-bin spacing to be 142 ns. The spin coherence at 21 = 142 ns is 
plotted as a function of N,, in Extended Data Fig. 3e, and decreases for 
large N,, primarily owing to heating-related effects”. 


Fibre network 

The schematic of the fibre network used to deliver optical pulses to 
and collect reflected photons fromthe nanophotonic memory device 
is shown in Extended Data Fig. 1b. Photons are routed through the lossy 
(1%) port of a 99:1 fibre beamsplitter to the nanophotonic device. We 
note that for practical implementation of memory-assisted quantum 
communication, an efficient optical switch or circulator should be 
used instead. In this experiment, since we focus on benchmarking the 
performance of the memory device itself, the loss introduced by this 
beamsplitter is incorporated into the estimated channel loss. Reflected 
photons are collected and routed back through the efficient (99%) port 
of the fibre beamsplitter and are sent to the TDI in the heralding set-up. 
The outputs of the TDI are sent back into the dilution refrigerator and 
directly coupled to superconducting nanowire single photon detec- 
tors (SNSPDs, PhotonSpot), which are mounted at the 1 kelvin plate of 
the dilution refrigerator and are coated with dielectrics to optimize 
detection efficiency exactly at 737 nm. 

The total heralding efficiency 7 of the memory node is an impor- 
tant parameter since it directly affects the performance of the BSM 
for quantum communication experiments. One of the contributing 
factors is the detection quantum efficiency (QE) of the fibre-coupled 
SNSPDs. To estimate it, we compare the performance of the SNSPDs 
to the specifications of calibrated conventional avalanche photodi- 
ode single-photon counters (Laser Components COUNT-10C-FC). The 
estimated QEs of the SNSPDs with this method are as close to unity as 
we can verify. Additionally, we measure <1% reflection from the fibre— 
SNSPD interface, which typically is the dominant contribution to the 
reduction of QE in these devices. Thus we assume the lower bound 
of the QE of the SNSPDs to be 1 = 0.99 for the rest of this section. 
Of course, this estimation is subject to additional systematic errors. 
However, the actual QE of these detectors would be acommon factor 
(and thus drop out) inacomparison between any two physical quantum 
communication systems. 

Here we use two different approaches to estimate 7. We first measure 
the most dominant loss, which arises from the average reflectivity of 
the critically coupled nanophotonic cavity (Fig. 2b). While the |*) state 
is highly reflecting (94.4%), the |v) state reflects only 4.1% of incident 
photons, leading to an average device reflectivity of 7, = 0.493. 

In method (1), we compare the input power photodiode M1 with that 
of photodiode MC (Extended Data Fig. 1b). This estimates a lower bound 
on the tapered-fibre diamond waveguide coupling efficiency of 
n.= 0.930 + 0.017. This error bar arises from uncertainty due to photo- 
diode noise and does not include systematic photodiode calibration 
uncertainty. However, we note that if the tapered fibre is replaced by 
asilver-coated fibre-based retroreflector, this calibration technique 
extracts a coupling efficiency of ne = 0.98, which is consistent with 
the expected reflectivity from sucha retroreflector. We independently 
calibrate the efficiency through the 99:1 fibre beamsplitter and the TDI 
to ben,=0.934. This gives us our first estimate on the overall heralding 
efficiency 7 = NspNcNloe = 0.425 + 0.008. 
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In method (2), during the experiment we compare the reflected 
counts from the highly reflecting (|*)) spin-state measured on the 
SNSPDs with the counts on an avalanche photodiode single photon 
counting module (M2 in Extended Data Fig. 1b) which has a calibrated 
efficiency of -0.7 relative to the SNSPDs. From this measurement, 
we estimate an overall efficiency of fibre-diamond coupling, as well 
as transmission through all relevant splices and beamsplitters, of 
N= 0.864 + 0.010. This error bar arises from shot noise on the sin- 
gle photon detectors. Overall, this gives us a consistent estimate of 
N= NspNcMNoe = 0.422 + 0.005. Methods (1) and (2), which each have 
independent systematic uncertainties associated with imperfect 
photodetector calibrations, are consistent to within a small residual 
systematic uncertainty, which is noted in the text where appropriate. 


Quantum communication experiment 

Anasynchronous BSM (Fig. 3a) relies on (1) precise timing of the arrival 
of optical pulses (corresponding to photonic qubits*”*° from Alice and 
Bob) with microwave control pulses onthe quantum memory, and (2) 
interferometrically stable rotations on reflected time-bin qubits for 
successful heralding, described in Extended Data Fig. 4. 

In order to accomplish (1), all equipment used for generation 
of microwave and optical fields is synchronized by a single device 
(National Instruments HSDIO, Extended Data Fig. 1a) with program- 
ming described in Extended Data Tables 1, 2. 

In order to accomplish (2), we usea single, narrow linewidth (<50 kHz) 
Ti:sapphire laser (M Squared SolsTiS-2000-PSX-XF, Extended Data 
Fig. 1b) both for generating photonic qubits and locking the TDI used 
to herald their arrival. Inthe experiment, photonic qubits are reflected 
from the device, sent into the TDI, and detected on the SNSPDs. All 
detected photons are processed digitally on a field-programmable 
gate array (FPGA, Extended Data Fig. 1a), and the arrival times of these 
heralding signals are recorded on a time-tagger (TT, Extended Data 
Fig. 1a), and constitute one bit of information of the BSM (m, or m,). 
At the end of the experiment, a 30-ps pulse from the readout path is 
reflected offthe device, and photons are counted in order to determine 
the spin state (m,) depending on the threshold shown in Fig. 2c. 

To minimize thermal drift of the TDI, it is mounted ona thermally 
weighted aluminium breadboard, placed ina polyurethane-foam-lined 
and sand-filled briefcase, and secured with glue to ensure passive sta- 
bility on the minute timescale. We halt the experiment and actively 
lock the interferometer to the sensitive Y-quadrature every ~200 ms 
by changing the length of the roughly 28-m-long (142 ns) delay line 
with a cylindrical piezo. In order to use the TDI for X-measurements 
of the reflected qubits, we apply a frequency shift of 1.8 MHz using 
the qubit AOM, which is 1/4 of the free-spectral range of the TDI. Since 
the nanophotonic cavity, the TDI and the SNSPDs are all polarization 
sensitive, we use various fibre-based polarization controllers (Extended 
Data Fig. 1b). All fibres in the network are covered with aluminium foil 
to prevent thermal polarization drifts. This results in an interference 
visibility of the TDI of >99% that is stable for several days without any 
intervention with laboratory temperature and humidity variations of 
+1°C and +5%, respectively. 

In order to achieve high-fidelity operations, we have to ensure that 
the laser frequency (which is not locked) is resonant with the SiV fre- 
quency fy (which is subject to the spectral diffusion*’). To do that, we 
implement aso-called preselection procedure, described in Extended 
Data Tables 1, 2 and Extended Data Fig. 1a. First, the SiV spin state is 
initialized by performing a projective measurement and applying 
microwave feedback. During each projective readout, the reflected 
counts are compared with two thresholds: a ‘readout’ threshold of 7 
photons (used only to record m,), anda ‘status’ threshold of 3 photons. 
The status trigger is used to prevent the experiment from running 
in cases when the laser is no longer on resonance with fo, or if the SiV 
has ionized to an optically inactive charge state. The duty cycle of the 
status trigger is externally monitored, andis used to temporarily abort 


the experiment and run an automated re-lock procedure that locates 
and sets the laser to the new frequency f,, reinitializing the SiV charge 
state with a520 nm laser pulse if necessary. This protocol enables fully 
automated operation at high fidelities (low QBER) for several days 
without human intervention. 


Optimal parameters for asynchronous BSMs 

We minimize the experimentally extracted QBER for the asynchronous 
BSM to optimize the performance of the memory node. One major 
factor contributing to QBER is the scattering of a third photon that is 
not detected, owing to the finite heralding efficiency 7 = 0.423 + 0.04. 
This is shown in Fig. 2f, where the fidelity of the spin-photon entangled 
state diminishes for (n),,, 2 0.02. At the same time, we would like to 
work at the maximum possible (n),, in order to maximize the data rate 
to get enough statistics to extract QBER (and in the quantum com- 
munication setting, efficiently generate a key). 

To increase the key generation rate per channel use, one can also 
fit many photonic qubits within each initialization of the memory. In 
practice, there are two physical constraints: (1) the bandwidth of the 
SiV-photon interface; and (2) the coherence time of the memory. We 
find that one can satisfy (1) at a bandwidth of roughly 50 MHz with no 
measurable infidelity. For shorter optical pulses (<10 ns), the spin-pho- 
ton gate fidelity is reduced. In principle, the SiV-photon bandwidth can 
be increased by reducing the atom-cavity detuning (here ~60 GHz) 
at the expense of having to operate at higher magnetic fields where 
microwave qubit manipulation is not as convenient’®. 

Even with just an XY8-1 decoupling sequence (number of tt pulses 
N,,= 8), the coherence time of the SiV is longer than 200 ps (Extended 
Data Fig. 3c) and canbe prolonged to the millisecond range with longer 
pulse sequences”. Unfortunately, to satisfy the bandwidth criterion (1) 
above, and to drive both hyperfine transitions (Extended Data Fig. 3a), 
we must use short (32-ns-long) 1 pulses, which already cause additional 
decoherence from ohmic heating” at N,, = 64 (Extended Data Fig. 3e). 
Because of this, we limit the pulse sequences to a maximum N,, = 128, 
and only use up to ~20 p's of the memory time. One solution would be 
to switch to superconducting microwave delivery. Alternatively, we 
could use a larger value of rT to allow the device to cool down between 
pulses** at the expense of having to stabilize a TDI of larger 6¢. Working 
at larger St would also enable temporal multiplexing by fitting multiple 
time-bin qubits per free-precession interval. In fact, with 27 = 142 ns, 
even given constraint (1) and the finite 1 time, we can fit up to 4 optical 
pulses per free-precession window, enabling a total number of photonic 
qubits of up to N=504 for anN, of only 128. 

In benchmarking the asynchronous BSM for quantum communica- 
tion, we optimize the parameters (n),, and Nto maximize our enhance- 
ment over the direct-transmission approach. The enhancement is a 
combination of bothincreasing Nand reducing the QBER, since a large 
QBER results ina small distilled key fraction r,. As described inthe main 
text, the effective loss can be associated with (n),, whichis the average 
number of photons per photonic qubit arriving at the device, and is 
given straightforwardly by (n), = (n),,/N. The most straightforward 
way to sweep the loss is to keep the experimental sequence the same 
(fixed N) and vary the overall power, which changes (n),,. The results 
of such a sweep are shown in Extended Data Fig. 5a, b. For larger (n),, 
(corresponding to lower effective channel losses), the errors associ- 
ated with scattering an additional photon reduce the performance of 
the memory device. 

Owing to these considerations, we work at roughly (n),, < 0.02 for 
experiments reported in the main text and shown in Figs. 3 and 4, below 
which the performance does not improve substantially. At this value, 
we obtain BSM successes at a rate of roughly 0.1 Hz. By fixing (n),, and 
increasing N, we maintain a tolerable BSM success rate while increasing 
the effective channel loss. Eventually, as demonstrated in Extended 
Data Fig. 5c and in the high-loss data point in Fig. 4, effects associated 
with microwave heating result in errors that again diminish the 


performance of the memory node for large N. As such, we conclude 
that the optimal performance of our node occurs for (n),, = 0.02 and 
N=124, corresponding to an effective channel loss of 69 dB between 
Alice and Bob, whichis equivalent to roughly 350 km of telecommuni- 
cations fibre. 

We also find that the QBER and thus the performance of the com- 
munication link is limited by imperfect preparation of photonic qubits. 
Photonic qubits are defined by sending arbitrary phase patterns gener- 
ated by the optical arbitrary waveform generator to a phase modulator. 
For an example of sucha pattern, see the blue curve in Fig. 3a. We use 
animperfect pulse amplifier with finite bandwidth (0.025-700 MHz), 
and find that the DC component of these waveforms can result in error 
in photonic qubit preparation at the few per cent level. By using a tai- 
lored waveform of phases with smaller (or vanishing) DC component, 
we can reduce these errors. We run such an experiment during the test 
of the Bell-CHSH inequality. We find that by evaluating BSM correla- 
tions from |+a) and |+b) inputs during this measurement, we estimate 
a QBER of 0.097 + 0.006. 

We obtain the effective clock-rate of the communication link by 
measuring the total number of photonic qubits sent over the course 
of an entire experiment. In practice, we record the number of channel 
uses, determined by the number of sync triggers recorded (see 
Extended Data Fig. 1a) as well as the number of qubits per sync trigger 
(N). We then divide this number by the total experimental time from 
start to finish (about 1-2 days for most experimental runs), including 
all experimental downtime used to stabilize the interferometer, read 
out and initialize the SiV, and compensate for spectral diffusion and 
ionization. For N=248, we extract aclock rate of 1.2 MHz. As the distilled 
key rate in this configuration exceeds the conventional limit of p/2 by 
afactor of 3.8 +1.1, itis competitive with a standard linear-optics-based 
system operating at a4.5'}3MHz clock rate. 


Benchmarking memory-assisted operation 
Asingle optical link can provide many channels—for example, by mak- 
ing use of different frequency, polarization or temporal modes. To 
account for this, when comparing different systems, data rates can 
be defined on a per-channel-use basis. In a quantum communication 
setting, full usage of the communication channel between Alice and 
Bob means that both links from Alice and Bob to Charlie are in use 
simultaneously. For an asynchronous sequential measurement, typi- 
cally only half of the channel is used at a time, for example from Alice 
to Charlie or Bob to Charlie. The other half can in principle be used for 
a different task when not in use. For example, the unused part of the 
channel could be routed to a secondary asynchronous BSM device. In 
our experiment, we can additionally define as a second normalization 
the rate per channel ‘occupancy’, which accounts for the fact that only 
half the channel is used at any given time. The rate per channel occu- 
pancy is therefore half the rate per full channel use. For comparison, 
we typically operate at 1.2% channel use and 2.4% channel occupancy. 
To characterize the optimal performance of the asynchronous Bell 
state measurement device, we operate it in the optimal regime deter- 
mined above (N= 124, (n),, $ 0.02). We note that the enhancement in 
the sifted key rate over direct transmission is given by 


R__4(Nq—1(Ng-2)Neut 
‘: 2N, 


(2) 


Rmax 


and is independent of (n),,, for a fixed number of microwave pulses (N,,) 
and optical pulses per microwave pulse (N,,,,) and thus fixed N=N,Neup- 


For low (n),,, three photon events become negligible and therefore 
QBER saturates, such that the enhancement in the distilled key rate 
saturates as well (Extended Data Fig. 5a). We can therefore combineall 
data sets with fixed N = 124 below (n),, $ 0.02 to characterize the 
average QBER of 0.116 + 0.002 (Fig. 3c). The key rates cited inthe main 
text relate to a data set in this series ((n),, = 0.02), with a QBER of 
0.110 + 0.004. Asummary of key rates calculated ona per-channel use 
and per-channel occupancy basis, as well as comparisons of perfor- 
mance to an ideal linear-optics BSM and the repeaterless bound? are 
given in Extended Data Table 4. 

Furthermore, we extrapolate the performance of our memory node 
to include biased input bases from Alice and Bob. This technique ena- 
bles a reduction of channel uses where Alice and Bob send photons in 
different bases, but is still compatible with secure key distribution”, 
allowing for distilled key rates enhanced by at most a factor of 2. The 
extrapolated performance of our node for a bias of 99:1 is also dis- 
played in Extended Data Table 4, as well as comparisons to the relevant 
bounds. We note that basis biasing does not affect the performance 
when comparing to the equivalent direct-transmission experiment, 
which is limited by p,.,,/2 in the unbiased case and p,,, in the biased 
case. However, using biased input bases does make the performance 
of the memory-assisted approach more competitive with the fixed 
repeaterless bound? of 1.44p, 5. 
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HSDIO (National Instruments) is a digital signal generator that synchronizes using photodiodes M1, M2 and MC. c, Preparation of optical fields. The desired 
the experiment. Opt (MW) AWGisa Tektronix AWG7122B 5 GS/s (Tektronix phase relation between lock and qubit paths is ensured by modulating AOMs 
AWG70001a50 GS/s) arbitrary waveform generator used to generate photonic using phase-locked RF sources witha precise 1.8 MHz frequency shift between 
qubits (microwave control signals). All signals are recorded onatime-tagger them. The AM (amplitude modulator) and ®M (phase modulator) are used to 
(TT, PicoQuant HydraHarp 400). b, Fibre network used to deliver photons to define the photonic qubits. 


and collect photons from the memory device, including elements for 
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Extended Data Fig. 2| Characterization of device cooperativity. a, Cavity 
reflection spectrum far-detuned (blue) and on resonance (red) with SiV centre. 
Blue solid line is a fit to a Lorentzian, enabling extraction of linewidth 

K=21.8 GHz. Red solid line is a fit to a model used to determine the single- 
photon Rabi frequency g= 8.38 + 0.05 GHzand shows the onset of anormal 
mode splitting. b, Measurement of SiV linewidth far detuned (4. =248 GHz) 
from cavity resonance. Red solid line is a fit to a Lorentzian, enabling extraction 
of natural linewidth y= 0.123 GHz. 
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Extended Data Fig. 3 | Microwave characterization of spin-coherence 
properties. a, Optically detected magnetic resonance spectrum of the qubit 
transition at -12 GHz split by coupling to a nearby “C. b, Rabi oscillations, read 
out via the population in the |*) state (p,) showing t time of t,=30ns. ATtime 
of 32 nsis used for experiments reported in the main text. c, XY8-1 dynamical 
decoupling signal (unnormalized) asa function of total time 7, showing 
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coherence lasting on the timescale of several hundred microseconds. d, XY8-8 
dynamical decoupling signal (normalized) revealing a region of high fidelity at 
the relevant value of 2T=142 ns. e, Fidelity of spin state after a dynamical 
decoupling sequence with varying numbers of tt pulses (N,,; blue points). Red 
point (diamond) is under illumination with (n),,=0.02. 
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Extended Data Fig. 4 | Measurements ona single time-bin qubit inZand X 
bases. a, Example of optical pulses sent in the experiment described in Fig. 2d. 
b, Time trace of detected photons on the + detector (see Fig. 2a) when the 
pulses shown inaare sent directly into the TDI. The first and last peaks 
correspond to late and early photons taking the long and short paths of the TDI, 
which enable measurements in the Zbasis, {|e),|/)}. The central bin corresponds 
to the late and early components overlapping and interfering constructively to 
come out of the + port, equivalent toa measurement of the time-bin qubit in the 
|+x) state. A detection event in this same timing window on the other detector 
(not shown) would constitute a |—x) measurement. In this measurement, the 
TDI was left unlocked, so we observe no interference in the central window. 
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Extended Data Fig. 5| Performance of memory device versus channel loss. a, photon. b, Left, plot of QBER for same sweep of (n),, shown ina. Right, plot of 
Enhancement of memory-based approach compared to direct-transmission QBER while sweeping Nin order to vary loss. These points correspond tothe 
approach, keeping N=124 fixed and varying (n),, in order to vary the effective same data shown in Fig. 4. At lower p,., (larger NV), microwave-induced heating- 
channel transmission probability, p,,,. At high p,,., (larger (n),,), 7,approaches related dephasing leads to increased QBER. Vertical error bars, 68% confidence 


0 owing to increased QBER arising from undetected scattering of athird interval; horizontal error bars, s.d. of the systematic power fluctuations. 


Extended Data Table 1 | High-level experimental sequence 


Step Process Duration Proceed to 

1 Lock time-delay interferometer 200 ms 2 

2 Readout SiV 30 ps If status LOW: 4, else: 3 
3 Apply microwave 7 pulse 32 ns 2 

4 Run main experiment script ~ 200 ms 1 


This sequence (described by the ‘Step’ number, description of the ‘Process’, approximate ‘Duration’ and conditional step it ‘Proceeds’ to) is programmed into the HSDIO and uses feedback from 
the status trigger sent from the FPGA (see Extended Data Fig. 1a). The main experimental sequence is described in Extended Data Table 2. External software with a response time of 100 ms is 
also used to monitor the status trigger. If it is HIGH for >2s, the software activates an automatic re-lock procedure which compensates for spectral diffusion and ionization of the SiV centre 
(Methods). Additionally, we keep track of the timing when the time-delay interferometer (TDI) piezo voltage reaches a limiting value. This guarantees that the SiV is always resonant with the 
photonic qubits and that the TDI performs high-fidelity measurements in the X basis. 
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Extended Data Table 2 | Main experimental sequence 


Step Process Duration Proceed to 

1 Run sequence in Fig. 3a for a given N | 10 — 20 pps 2 

2 Readout SiV + report readout to TT 30 ps If status LOW: 1, else: 3 
3 Apply microwave 7 pulse 32 ns 4 

4 Readout SiV 30 ps If status LOW: 3, else: 1 


This script is followed until step 1 is run a total of 4,000 times, and then terminates and returns to step 1 of Extended Data Table 1. The longest step is the readout step, which is limited by the fact 
that we operate at a photon detection rate of ~1 MHz to avoid saturation of the SNSPDs. 


Extended Data Table 3 | Truth table of asynchronous BSM 


protocol 
Alice | Bob |] Parity | Bell state 

+a) | |+a) +1 ®,) 

x) x) 1 @_) 
—x) | |+z) —1 @_) 
—x) | |-2) +1 ®,) 
+y) | [yy || <1 eo) 
+9)°| l=y) |) 1 +) 
=) Reged O+) 
=) | =e) | =) eo) 


Shown is the parity (and BSM outcome) 


‘or each set of valid input states from Alice and Bob. 


In the case of Y-basis inputs, Alice and Bob adjust the sign of their input state depending on 
whether it was commensurate with an even- or odd-numbered free-precession interval, based 
on timing information provided by Charlie (Supplementary Information). 
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Extended Data Table 4 | Quantum-memory-based advantage 


per channel 


per channel 


per channel 


per channel 


occupancy occupancy use use 
X:Y basis bias 50 : 50 99:1 50 : 50 99:1 
Distilled key rate R [1077] | 1.197914 DARE Mae 2, AGOt ee 
FE) Resel CY ) PU ater 20650 di3te 3 AAS SS 
R/(1.44p442) 0.717 6.8 140° 5 17 1.437017 2.807 5°33 
1—confidence level Lathe * A072 | Sts RAO | Tete 10" 


Overview of distilled key rates R using the asynchronous BSM device and comparison to ideal direct- communication implementations, based on the performance of our network node for 
N=124 and (n),, = 0.02. Distillable key rates for E = 0.110 + 0.004 are expressed in a per-channel-occupancy and per-channel-use normalization for unbiased and biased basis choice (X:Y basis 
bias) (Methods). Enhancement (fraction of key rates R/R,,,, and R/(1.44p,.,)) is calculated versus the linear optics BSM limit (R,,.,(50:50) = p,.2/2 for unbiased bases, R,,,,(99:1) = 0.98p,., with 
biased bases) and versus the fundamental repeaterless channel capacity® (1.44p,.,,). Confidence levels for surpassing the latter bound? are given in the final row. 
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The defining characteristic’? of Cooper pairs with finite centre-of-mass momentum is 
a spatially modulating superconducting energy gap A(r), wherer is a position. 
Recently, this concept has been generalized to the pair-density-wave (PDW) state 


predicted to exist in copper oxides (cuprates)**. Although the signature of a cuprate 
PDW has been detected in Cooper-pair tunnelling?, the distinctive signature in single- 
electron tunnelling of a periodic A(r) modulation has not been observed. Here, using a 
spectroscopic technique based on scanning tunnelling microscopy, we find strong 
A(r) modulations in the canonical cuprate Bi,Sr,CaCu,Og,,; that have eight-unit-cell 
periodicity or wavevectors Q = (21t/d)(1/8, 0) and Q = (211/a)(0, 1/8) (where ay is the 
distance between neighbouring Cu atoms). Simultaneous imaging of the local density 
of states Mr, E) (where Fis the energy) reveals electronic modulations with 
wavevectors Q and 2Q, as anticipated when the PDW coexists with superconductivity. 
Finally, by visualizing the topological defects in these Mr, F) density waves at 2Q, we 
find them to be concentrated in areas where the PDW spatial phase changes by Tr, as 
predicted by the theory of half-vortices in a PDW state®’. Overall, this isa compelling 
demonstration, from multiple single-electron signatures, of a PDW state coexisting 
with superconductivity in Bi,Sr,CaCu,Og,<. 


The exact nature of the cuprate pseudogap state® has been the focus 
of extensive research as a route to understanding high-temperature 
superconductivity. Attention has recently focused on a pair-density- 
wave (PDW) state*“ as a leading candidate to be the fundamental order 
parameter that characterizes the pseudogap. This was originally moti- 
vated by transport studies’, which led to the hypothesis of ‘stripe 
superconductivity’, in which the superconducting order parameter 
is spatially modulated and thus a PDW’°”. Equally, the highly unusual 
band structure reconstruction near the pseudogap opening tempera- 
ture 7* as observed by angle-resolved photoemission spectroscopy” 
can be explained relatively simply by the formation of aPDWstate?. 
Indeed, a wide variety of microscopic theories based on strong, local 
electron-electron interactions now envisage a copper oxide (cuprate) 
PDW state” ~, while experimental evidence for its existence is rapidly 
emerging from multiple techniques***?”*. 


Characteristic signatures in single-electron tunnelling 
of the PDW state 


Here we focus on the challenge of detecting the cuprate PDW state 
using single-electron tunnelling. First, we consider a PDW, whose 
spatially dependent energy gap is A(r) = Ag [e'@* + e ‘2*], where AG 
is the amplitude of gap modulations at wavevector Q, ris a position 
and F,is the form factor with either s- or d-symmetry. The most obvious 


and immediate prediction is that the single-electron tunnelling should 


detect a gap inthe density-of-states spectrum MF) (where Fis energy), 
which modulates at Q. It is striking, therefore, that no such modulating 
A(r) has ever been observed in the cuprates. Second, if such a PDW 
coexists with d-wave superconductivity (SC), whose homogeneous 
gap is A‘(r) = F,-A%, where Fec exhibits d-symmetry, then Ginzburg- 
Landau theory predicts the form of Mr, £) modulations generated by 
the interactions between the PDW A(r) and the superconducting A‘(r). 
These modulations are identifiable from products of these two order 
parameters that transform as density-like quantities. Thus, consider- 
ing the product of the PDW and SC order parameters, AG 4* predicts 
N(r) < cos(Q:r) modulations at the PDW wavevector Q, while the 
product of aPDWwithitself 46 A"4 predicts M(r) « cos(2Q- r)at twice 
the PDW wavevector. Therefore, a second unique signature of a PDW 
with wavevector Qin the superconducting cuprates would be the coex- 
istence of two sets of M(r, E) modulations at Q and 2Q. Finally, a topo- 
logical defect with 2m phase winding” in the induced density wave 
N(r) « cos(2Q-r) is predicted to generate a local phase winding of 
Tt in the PDW order, at a half-vortex® (Fig. 1a). This is the topological 
signature of a PDW coexisting with homogeneous SC. Experimental 
detection of these phenomena in single-electron tunnelling would 
constitute compelling evidence for the PDW state. 

To explore these predictions, we use spectroscopic imaging scan- 
ning tunnelling microscopy”® (SI-STM) with a Bi,Sr,CaCu,Os,snano- 
flake tip® to visualize the single-electron tunnelling. Utilization of 
the superconducting tip enhances the energy resolution due to the 
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Fig. 1| Schematic of unidirectional 8a, PDW order intertwined witha 
density wave. a, Unidirectional PDW order parameter is modulated along the 
horizontal axis at eight-unit-cell periodicity. The sign of A(r) is coloured in red 
for positive and in blue for negative. The periodicity in the unidirectional 
density-of-electronic-states M(r) detectable by NIS tunnelling, which is 
intertwined with unidirectional PDW, is depicted by a black broken line. 

The Mr) wavelength is half that of the unidirectional PDW. When a dislocation 
occurs in unidirectional M(r), where the 2m phase winding is realized inits 
phase, a possible fluctuation in unidirectional PDW order amplitude and/or 
half-vortices is predicted. The relative phase of the spatial variations in the 
PDW order and the induced density-wave modulations in M(r) are also 

plotted at the top and bottom, respectively. b, Typical SIS topography of 
Bi,Sr,CaCu,0,,;within40 nm x 40 nm FOV at 5 GO junction resistance (/=30 pA 
at Viias = 150 meV). Cu-O-Cu bond directions are parallel to the x andy axes. 
Individual Bi atoms are clearly observed as shown in the inset with intra-unit- 
cell resolution. c, Spatially averaged SIS g(r) spectrum (red) together with that 
taken with NIS junction (black). The spatially averaged SIS g(r, £) spectrum 
shows particle-hole symmetric peaks with energies at £=+ 66 meV. As the 
spatially averaged NIS g(r) peaks at +37 meV, we estimate the average gap value 
onthe Bi,Sr,CaCu,0,,;nanoflake tip, 4;, to be about 29 meV. 


convolution of spectra that sharply peak at the superconducting gap 
edge, in the density of states N,(E) of the tip and Mr, £) of the sample. 
Thus, energy sensitivity to modulations in A(r) should be enhanced with 
this superconductor-insulator-superconductor (SIS) tunnelling tech- 
nique. To enable detection of the gap modulation, a bulk single-crystal 
sample of Bi,Sr,CaCu,Og,,; at the hole density p = 0.17 + 0.01 (the error 
approximately corresponds to atransition width) and superconducting 
transition temperature 7,=91K is cleaved at room temperature under 
ultrahigh vacuum conditions (3 x 10” torr) and then inserted into the 
cryogenic STM head. The superconducting tip is created by picking 
up ananometre-scale Bi,Sr,CaCu,Og,, flake from the sample® to form 
the SIS junction. The SI-'STM measurements throughout this study 
are then all performed using such SIS junctions at T=9 K. A typical SIS 
topography is shown in Fig. 1b fora40 nm x 40 nm field of view (FOV). 
The individual Bi atoms in the BiO plane with subatomic resolution 
are resolved as shown in the inset. The CuO, plane exists about 6A 
below the BiO plane. 


Direct visualization of the periodic energy gap 
modulations 


Differential SIS conductance spectra g(r, £) = di/dV(r, E=eV), where/ 
is the tunnelling current, Vis the bias voltage and e is the elementary 
charge are then measured as a function of position in this FOV for the 
energy range from -150 meV to +150 meV. A typical such spatially aver- 
aged g(r, £) spectrum is shownin red in Fig. 1c, together with a normal- 
insulator-superconductor metal (NIS) spectrum performed earlier on 
the same sample but ina different FOV. The SIS g(r, £) spectrum, being 
aconvolution of the tip N;(F) and sample M(F) demonstrates enhanced 
energy resolution as expected (red in Fig. 1c). Here, as the spatially 
averaged NIS g(r, £) spectrum peaks at +37 meV, while the equivalent 
SIS spectrum peaks at +66 meV, we estimate the average energy gap 
of the tip 4, to be 29 meV. 

Next, by measuring half the magnitude of the energy that sepa- 
rates the SIS spectrum peaks at every location, and then subtracting 
A,, we determine the local gap energy map A(r) in the sample. A typi- 
cal example is shown in Fig. 2a. Figure 2b shows the magnitude of the 
power-spectral-density Fourier transform A(q) of A(r) from Fig. 2a, 
where q is a wavevector. Equivalent results have been achieved using 
SIS tunnelling with three different Bi,Sr,CaCu,O,,;nanoflake tips, on 
three different samples and with two different STMs (Methods section 
‘Motivation of searches for a PDW signature in A(r)’). In Fig. 2b, qgy,cor- 
responds to a wavevector of the crystal-structure supermodulation. 
This supermodulation does indeed produce a type of PDW detectable 
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Fig. 2| Superconducting gap energy modulations. a, Measured A(r) within 
40 nm x40 nm FOV. The energy-gap data presented here and throughout the 
manuscript are all the measured values of the magnitude of A(r), whichis half 
the energy separation between the coherence peaks minus the gap value of the 
Bi,Sr,CaCu,O,,;nanoflake tip. The inset shows a distribution of measured A(r) 
inthe same FOV, ranging from 27 to 56 meV. The eight-unit-cell modulation is 
clearly resolved in A(r), primarily running along the x direction of Cu-O-Cu 
bond exhibiting the unidirectional signature of the PDW. b, The magnitude 
Fourier transform of a. Well defined Fourier peaks at Q = (211/a,)(+1/8, 0) and 


by its energy gap modulations; but this PDW is trivial, occurring due to 
modulation of the crystal unit-cell dimensions (Methods section ‘Effect 
of structural supermodulation on measured A(r)’). Second, there is a 
very broad peak in A(q) surrounding q = 0 due to the well known ran- 
dom energy-gap disorder of Bi,Sr,CaCu,O,,;, and this is equivalent to 
the broad range of gap values in the histogram inset to Fig. 2a. Finally, 
there are four distinct local maxima in A(q) at the points indicated by 
black solid dots surrounding q = (0, +0.125) and q = (+0.125, 0)(21t/d)), 
where Qp is the periodicity. 

These features indicate that there is a strong, if disordered, mod- 
ulation in A(r), running parallel to the Cu-O-—Cu bonds of the CuO, 
plane. This modulation exists on top of anon-periodic energy gap 
of about 37 meV. It exhibits well defined peaks at Q,. = (211/d,) (1/8, 0) 
and Q, = (21/do)(0, 1/8) meaning that A(r) is modulating with about 
8a, periodicity along both axes. Such a variation in A(r) can be seen 
directly in a series of SIS g(r, £) spectra, extracted along the line in 
Fig. 2a and shown in Fig. 2c. Here we see a local demonstration of the 
energy of maximum Mr) (that is, of the coherence peak) modulating 
at about 8a, periodicity in a particle-hole symmetric fashion with an 
amplitude of approximately 6 meV. More fundamentally, line profiles 
from A(q) in Fig. 2b are plotted in Fig. 2d for both directions. The two 
well defined peaks in Fig. 2d characterize a PDW with wavevectors 
Q, = (0.129 + 0.003, 0) and Q, = (211/a)(0, 0.118 + 0.003). This is an 
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Q= (2tt/a,)(0, 1/8), corresponding to the eight-unit-cell modulations for both 
thexandy directions, are observed. Hand L denote high and low, respectively. 
c, The series of SIS g(r, £) spectra intensity along the lineina. The eight unit-cell 
modulation of the energies of the coherence peaks is clearly resolved. The 
modulation amplitude is about 6 meV. d, The line cut of |A(q)| along bothxandy 
directions fromb. As seen inb, Q = (211/a,)(+1/8, 0) and Q = (211/a,)(0, +1/8) 
peaks are present for both directions, but a peak height is about twice larger for 
xthan that for y direction indicating that the PDW is rather unidirectional. 


observation of coherent modulation in the superconducting energy 
gap A(r), andis precisely what is expected for a PDW state. Moreover, it 
reveals directly that the cuprate PDW occurs at wavevectors Q = (2T1/do) 
(1/8, 0) and Q = (211/a,)(0, 1/8). 


Relationship to PDW visualization using scanned 
Josephson tunnelling microscopy 

Using the same Bi,Sr,CaCu,O,.;nanoflake tip technology, on samples 
at the same doping as herein but operating at millikelvin temperatures, 
the magnitude of the Josephson current |/,(1)| is found to modulate 
witha wavelength of about 4a). Thus, modulations of |/,(r)| and of A(r) 
are both detectable when using nanoflake tips that are extracted from 
the same crystal that is being studied, and are likely in the same coex- 
isting SC and PDW state. Because the nanoflake tip is extended, an 
approximation to planar tunnelling must be considered. Here /, from 
anextended tip tothe crystal is composed of two contributions: Idue 
to pair tunnelling from cjc', to cc’, states, and / due to pair tunnel- 
ling from cjc!,49 to ckc',, 9 states, where c; is an electron creation 
operator atamomentumk, which are independent of each other when 
pair momentum is conserved (Methods section ‘Independent pair 
tunnelling process’). In scanned Josephson tunnelling microscopy, the 
circuitry measures the magnitude of Josephson critical current 
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magnitude: |/|| = |/}| + |/)|, for which |/] is a roughly constant spatially 
but Wl « |sin(Q,6)| where 6 is the spatial displacement between the 
PDW in the extended tip and the PDW in the sample (Methods section 
‘Independent pair tunnelling process’). Under these circumstances, if 
the PDW has periodicity 8d, its gap modulates with periodicity 8a), 
but the magnitude of the total Josephson current |/| will have periodic- 
ity 4d. This is the specific phenomenology detectable using the 
Bi,Sr,CaC,O,,;nanoflake tips for SIS spectroscopy and to measure the 
magnitude of Josephson critical currents’, respectively. Furthermore, 
enhanced sensitivity to the basic energy modulations when using SIS 
spectroscopy is consistent with a ‘lock-in’ effect from a PDW state in 
the nanoflake tip (Methods section ‘Effect of A(r) on nanoflake’). 


Detection of two unidirectional PDWs within distinct 
nanoscale domains 

Next, to explore the unidirectionality of A(r), we employ atwo-dimen- 
sional lock-in technique to determine the amplitude and phase of the 
modulations”. Thus 


(r-R)? 


Ag(t)= f dRA(R)e@%e 207 (1) 

Ag (1) = [Reg (Fr)? + ImAg (1)? (2) 
A (pn) = -IMAQ () 

g(r) =tan Redg () (3) 


whereA(r) represents any arbitrary real space image, Q the wavevector 
of interest and o the averaging length-scale inr-space (or equivalently 
the cut-off length in g-space). The key ingredients of such an analysis 
are the amplitude |Ag(r)| and the spatial phase G(r) of modulations 
at Q. Using this technique on our A(r) data, Fig. 3a, b shows the ampli- 
tudes of the PDW for the x and y directions, |dg (r)|, dg, (r)I, respec- 
tively. The local ‘magnitude’ of PDW unidirectionality is then defined as 


z [Ag (r)| - lg, (r)| 


r) = _ (4) 
\Ag, (r)|+ dg, (r)| 

When F(r) > O, represented in orange, the PDW along the xdirectionis 
dominant, and similarly when F(r) < O, represented in blue, the PDW 
order along the y direction is dominant. As shownin Fig. 3c, F(r) is spa- 
tially heterogeneous forming a domain structure indicating that the 
cuprate PDW A(r) is microscopically unidirectional, with one direction 
predominant in any particular domain. In addition, it appears that 
the domain size in orange is much bigger than that of blue within the 
40nm x40 nm FOV, which may indicate a vestigial nematic’ PDW state, 
although one cannot be certain without independent knowledge of the 
shape anisotropy of the nanoflake tip. Overall, these data indicate that 
the cuprate PDW state is locally strongly unidirectional, and possibly 
ina vestigial nematic state due to quenched disorder”’. 


How acoexisting PDW and superconductor induce the 
CDW modulations 


Although the SI-STM technique cannot be used to image a charge 
density p(r) or any of its modulations, a mapping of g(r, F) andits ratio 
Z(r, E) = g(r, +£)/g(r, -F) enables one to study how the related Mr, F) 
modulates. It has been found that the form-factor symmetry for the 
induced CDW incuprates exhibits primarily d-symmetry”. In that case, 
the CDW modulation does not appear primarily at Q and 2Q in the 
Fourier transform of g(r, £) or Z(r, F). Instead, to detect the d-symmetry 
form factor CDW signal at Q and 2Q, one must first use the d-symmetry 
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Fig. 3| The PDW order parameter amplitude and phase. a, The spatial 
variation of the PDW amplitude Ag (| along the x direction obtained by the 
two-dimensional lock-in technique. b, The spatial variation of the PDW 
amplitude Ag (r)| along the y direction obtained by the two-dimensional 
lock-in technique. The cut-off length is denoted by the broken circle. c, The 
local directionality map defined by equation (4). An orange domain is the 
region where the PDW amplitude along the x axis is predominant, while a blue 
domain is the region where the PDW amplitude along the y axis is 
predominant. 
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Fig. 4 | The interplay of N(r) and PDW, and the possible half-vortices. a, The 
spatial variation of the N(r) amplitude Dj (r) obtained by equations (1) and (2). 
The inset shows an azimuthal angular averaged cross-correlation coefficient as 
a function of distance. b, The spatial variation of the 4g (r)amplitude obtained 
by equations (1) and (2).c, The magnitude of the phase-resolved Fourier 
transform, |D7(q)| exhibiting both Q = (211/ay)(+1/8, 0) and 2Q = (211/a,) (+1/4, 0) 
and peaks encircled by red broken lines, respectively. Coordinates are in units 
of 211/d.d, The line cut of |D7(q)|, in which the Lorentzian background is 
subtracted, inc from (211/a,)(0, 0) to (21/a,)(+1/4, 0), exhibiting well defined 
peaks at Qand 2Q. Data points are fitted by Lorentzians and the obtained peak 
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positions are (211/d,)(0.113 + 0.002) and (211/a,)(0.241+ 0.003) for Qand 2Q, 
respectively, with the peak widths (21t/a))(0.016 + 0.004) and (21t/dy) 

(0.068 + 0.006) for Qand 2Q, respectively. e, The spatial phase of the 2Q Mr) 
order ®3q (r) obtained by equation (3). 2m topological defects are marked by 
solid dots. White (black) dots indicate 2m phase winding along clockwise 
(anticlockwise). f, The spatial phase of the PDW 0G, (1). The 2 topological 
defectsin iq (1) fromeare plotted ontop of OG (tr). The inset shows the 
distribution of O% (r) values at all the locations where the 2m topological 
defectsin iq, (1) are found. The blue crosses are the count and the horizontal 


bars represent the bin size. 
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sublattice-phase-resolved Fourier analysis (Methods section ‘Sublattice 
phase-resolved analysis’). For this reason, we apply a phase-resolved 
visualization of the d-symmetry modulations to our measured Z(r, F) 
(ref. 7”), extracting the value of Z(r, F) at the oxygen sites within each 
CuO, unit cell: 02(r) = Z(n)d(r - Yo,) at O; and similarly for 05 (r) at O,. 
We then subtract these values throughout the image to yield 


D7(r) = 02 (r) - 02(r) (5) 


In Fig. 4a, b, we show the amplitudes Of |Ag (r)| and ID5q,(r, F= 
54 meV)|; systematics of the g-space cut-off length used are discussed 
in Methods section ‘Cut-off dependence’. If we then consider the mag- 
nitude of the Fourier transform of D“(r, E) for E= 54 meV where SIS. 
tunnelling has the maximum energy sensitivity (Fig. 1c), a key fact 
emerges. In the Fourier transform |D“(q, 54 meV)|, we find two strong 
peaks at Q and 2Q (Fig. 4c), which are the clearest in these data when 
presented along the line (0, 0)-(21t/a,)(0.4, 0) in Fig. 4d (from which 
aLorentzian background has been subtracted). This complex density 
wave structure is the expected signature in Mr, £) modulations” * of 
the PDW with wavevector Q coexisting with the homogeneous SC. 

Finally, utilizing the two-dimensional lock-in technique to generate 
phase-resolved images, the spatial phase OJ At) of Dig Ar, 54 meV)is 
extracted using equation (3), and is shown in Fig. 4e. The topological 
defects with 2m phase winding in the M(r, 54 meV) density wave are 
marked by the black and white dots, for which the winding direction 
is clockwise and anticlockwise, respectively. The presence of these 21 
topological defects in the M(r, 54 meV) density wave at 2Q is due micro- 
scopically to a dislocation as schematically shown in Fig. 1a (black line). 
To visualize the interplay of the 2Q density wave and the PDW, the spa- 
tial phase Ors of the PDW order is extracted in the same way, but now 
at On/a,)(£1/8, 0). In Fig. 4f, the locations of the @%, 0, {t) topological 
defects from Fig. 4e are also plotted on top of the PDW spatial phase 
Or At). Intriguingly, the 50 At) topological defects are always found 
in the vicinity ofthe yellow: cecinps in rs At), where the PDW phase is Tt 
(see Extended Data Fig. 8 for an elation of the PDW spatial phase). 
The inset in Fig. 4f shows that the probability distribution of the PDW 
phase OG, at which all the topological defects in D5 Ar) are found, is 
clearly centred around tt (see Extended Data Fig. 7for: anindependent 
analysis yielding the same conclusion). Thus, the local phase in the 
PDW surrounding the topological defects in 04, 0, (") always changes 
by approximately mt (see Extended Data Fig. 8) precisely as expected 
when a topological defect in the induced density wave at 2Q interacts 
with the PDW order®. 


Multiple single-electron signatures of a PDW coexisting 
with SC 


To summarize, use of Bi,Sr,CaCu,O,,; nanoflake scanned tips allows 
the detection of the spatially modulating energy gap A(r) with eight- 
unit-cell periodicity, or with axial wavevectors Q = (211/d,)(1/8, 0) and 
= (211/a,)(0, 1/8), in superconducting Bi,Sr,CaCu,Ox., (Fig. 2). The 
spatial analysis of the A(r) modulations shows that they are rather 
unidirectional within nanoscale domains (Fig. 3). Simultaneous 
density-of-states imaging reveals two pairs of coexisting M(r, £)) modu- 
lations, at wavevectors Q = (211/a,)(1/8, 0) and Q = (211/a,)(0, 1/8), and 
Q = (21t/d) (1/4, 0) and 2Q = (211/a,)(0, 1/4) (Fig. 4c, d). Finally, the 
topological defects in the M(r, F) density wave at 2Q are concentrated 
along the lines where the PDW spatial phase changes by tt (Fig. 4f). All of 
these phenomenaare canonical signatures*°”*°' of a PDW coexisting 
with homogeneous SC. Thus, A(r) modulation imaging provides direct 
spectroscopic evidence of the existence of a PDW, at zero magnetic 
field in cuprates. 
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Methods 


Sample preparation and measurement 

High-quality Bi,Sr,CaCu,Og,; single crystals were grown by the trav- 
elling-solvent floating zone method. Here we measured a sample 
with hole doping level of p = 0.17. The sample, covered by the Kapton 
tape, was loaded into the load-lock chamber with pressure better than 
3 x 107° torr and quickly inserted into the STM head at 7= 9K, after 
cleavage by removing the Kapton tape. 


Tip preparation and characterization 

The tip isotropy is checked by comparing the height of the Bragg peaks 
for x and y directions in the Fourier transform 7(q) of topographic 
images using the nanoflake tip 7(r). A40 nm x 40 nm FOV 7(r) and its 
real-part Fourier transform Re7(q) are shown in Extended Data Fig. 1a, b, 
respectively. Extended Data Fig. 1c shows line profiles of Re7(q) along 
the lines in Extended Data Fig. 1b across the Bragg peaks at (1, 0) and 
(0,1). Bragg peak heights at (1, 0) and (0, 1) are found to be comparable 
within 7%. 


Motivation and model for A(r) modulation detection 

Motivation of searches for a PDW signature in A(r). Here we discuss 
preliminary A(r) data, as shown in Extended Data Fig. 4a, c, that moti- 
vated and provide reinforcement for the data presented in this study, 
whichis completely independent of them. These data in Extended Data 
Fig. 4 were acquired with two different Bi,Sr,CaCu,O,;nanoflake tips, 
on two different samples, using two different SI-STM instruments. 
Although experimental conditions were not optimized for detection 
of A(r) modulations in a PDW, the peaks in A(q) at Q = (0, +0.125) and 
Q=(0, +0.125)(21t/a,) are weakly visible as marked by dashed white 
circles in Extended Data Fig. 4b, d. Such data, along with those reported 
in the main text from an experiment designed and optimized for the 
purpose, provide the type of experimental evidence available on the 
existence of 8a, modulations in A(r). 


Independent pair tunnelling process. Here we discuss how the 4a, 
modulation observed in the magnitude of the Josephson critical current 
and the 8a, modulation observed in A(r) in the present study may be 
linked. We consider a simple model for pair tunnelling from ananoflake 
Bi,Sr,CaCu,Og,; tip that is in the coexisting SC and PDW state, to the 
parallel surface of a bulk Bi,Sr,CaCu,Og.5 crystal in the same state 
(Extended Data Fig. 3). Because the tip is extended and parallel to the 
surface, the effects of planar tunnelling must be considered. In perfect 
planar tunnelling, thec}ct, Cooper pairs of the homogenous SC cannot 
tunnel into the cjc',, @Cooper pairs of the PDW, because that violates 
conservation of momentum. In that limit, the Josephson current /,from 
a Bi,Sr,CaCu,Og,; extended tip to the Bi,Sr,CaCu,Og,, crystal is com- 
posed of two independent contributions: [due to pair tunnelling from 

cjc!,toc;c!, states, and /; due to pairtunnellingfromc{ct,, qtocict pb 
states. 

Consider two PDW, one in the nanoflake tip Y; and one inthe sample 

W,, with order parameters 


Y= A, e1(e22*) and Y= A,e!2(e!) 


where x is the position and @ is the phase of the order parameter. 
The Josephson coupling will be of the form K(Y,W5+ V7 W.)« 
cos(6,— 8)cos(Q6) where K is a constant and 6is the variable spatial 
displacement of the tip PDW relative to the sample PDW. In this case, 
the inter-PDW Josephson current takes the form 


I< sin(@,— 6,)sin(Q6) 


Itis the magnitude Wl that is measured as a function of transverse dis- 
placement 6 between nanoflake tip and sample where Q=2m1/AanddAa 


wavelength, and this obviously modulates as Vl « |sin(Q6)| or witha 
periodicity of A/2. 

Our previous studies using scanned Josephson tunnelling” actually 
measured the magnitude of the Josephson current |/|. Thus, if /}and f 
are independent, then |/|=|/}|+ |/|. Assuming that |/}] is couphly 
constant spatially, then ui « isin(Q5)|, where 6 is the transverse dis- 
placement between the PDW in the extended tip and the PDW in the 
sample. Therefore, in this model for our experiment, if the PDW has 
true periodicity 8a, then its gap modulation A(r) will necessarily have 
periodicity 8a, but, critically, the modulations in magnitude of the 
total Josephson current |J| will have periodicity 4d. 

Note that ifthere are two strictly independent unidirectional PDWs 
with wavevectors Q, and Q,, and Cooper pair momentum of each is 
conserved, then the Q, PDW cannot tunnel to the Q, PDW and vice 
versa. This would pose a challenge to the above analysis. However, if 
the PDW state in the tip is somewhat biaxial (for example, ref. °°), then 
this analysis would retain validity. 


Effect of structural supermodulation on measured A(r). One might 
ask whether there is an effect of the crystal supermodulation with Q,,,|| 
(1, 1)(21/a,) that produces an energy gap modulation at its wavevector, 
on our measured A(r). As we reported in ref. °, we observed the modula- 
tions bothin A(r) and the Josephson critical current at Q,,,. However, this 
is atrivial effect and its wavevector is at 45° degrees off the Cu-O-Cu 
direction. Most importantly, a spatial convolution between the tip and 
sample of their modulating A(r) at Q,,, cannot produce any additional 
modulations at different wavevectors. Thus, the effect of structural 
supermodulation does not produce any other gap modulation signals, 
especially at Q = (0, +0.125) and Q=(0, +0.125)(211/d,). 


Effect of A(r) on nanoflake. Here we discuss how A(r) modulation 
detection is enhanced in Bi,Sr,CaCu,O,,; nanoflake SIS tunnelling. 
Here the measured A(r) can be regarded as a consequence of a spatial 
convolution between the sample and nanoflake PDW order parameters. 
The nanoflake is most likely in the same PDW state as it is picked up 
from the same sample. Thus, the order parameter on the nanoflake is 


approximated ina form Of A, onake XD (iQp rexp|-Z 5) where the 


exponential term is approximated to represent a imanofiake structure 
factor with size of nanoflake (about 3 nm, see Extended Data Fig. 2). This 
acts as alow-pass filter in the convolution between gap modulations at 
the same wavevector Q, inthe tip and inthe sample. Such aconvolution 
effect naturally works asa ‘lock-in’, mitigating the signals unrelated to 
the gap modulation wavevector Q,. This process makes the signal of 
A(r) modulation detectable. 


Data analysis 
Sublattice phase-resolved analysis. To reveal any possible local- 
density-of-states M(r, £) modulations, we analyse differential conduct- 
ance g(r, F) to yield Z(r, £) = g(r, +£)/g(r, -E) (Extended Data Fig. 5 and 
ref. 7°), Intensities at oxygen sites fo, and fo,are extracted separately 
from Z(r, F=54 meV) and used to formtwonew maps, OZ (r, £=54 meV) 
and O5(r, E=54 meV), respectively. We then calculate each sublattice- 
phase-resolved Z(r, F) image and separate it into three: the first, Cu(r), 
contains only the measured values of Z(r) at Cu sites while the other 
two, O,(1r) and O,(r), contain only the measurements at the x-/y-axis 
oxygen sites. 

Phase-resolved Fourier transforms of the O. Ar) and O,(r) sublattice 
images O. ) (q) = ReO. ) (q) + ilmO. (q) ; O ,(q) = ReO, ,(q) +ilmO, )(q), are 
used to determine the form factor symmetry for modulations at anyq 


B“(q)=(6,(q) - 6,(q))/2 


§2(q) = (0,(q) + 0,(q))/2 


Article 


§7(q) = Cu(q) 


Specifically, for a density wave occurring at Q, one can then evaluate 
the magnitude of its d-symmetry form factor D(Q) and its s’- and 
s-symmetry form factors S’(Q) and $(Q), respectively. In terms of the 
segregated sublattices, a d-form factor density wave is one for which 
the density wave on the O, sites is in antiphase with that on the O, sites. 
Studies of electronic structure in underdoped Bi,Sr,CaCu,Og,,,and 
Ca,_,Na,CuO,Cl, consistently exhibit a relative phase of mt and therefore 
ad-symmetry form factor. 

Hence the peaks at +Q, and +Q, presentin both, ).(q)andO, (q)must 
cancel exactly in 0,(q) +0, ),(q). Therefore, if a density wave at Q and 
2Qhas predominantly d- symmetry form factor, there is no detectable 
signal in g(r, F) or Z(r, E) atQ and 2Q, and why the d-symmetry Fourier 
transform D(q, £) or D“(q, £) are used in these studies. Specifically, by 
calculating D“(q) = FFT(D7(r)) one correctly extracts the d-symmetry 
density wave modulations that are occurring at Q and 2Q. 


Cut-off dependence. Here we show how the images shown in Fig. 4 
evolve asa function of cut-off length used in the two-dimensional lock- 

in technique. In Extended Data Fig. 6, bothD%q (r, 54 meV) and Ao, (r) 
are shownat different real-space cut- offlengths: 8,16,24,32and40A. 

In the left column, we can see a big change between 8 and 16 Ain the 
spatial structure Of |DZq (AO as oscillatory components are vanished, 

while|D%q fr)latl6, 24, 32 and 40 Aare virtually identical. For Ag. (r)in 
the right column i in Extended Data Fig. 6, the oscillatory components 
are gone between 16 and 24 A. Thus, the cut-off lengths used in Fig. 3, 
16 and 40A, do not introduce erroneous oscillations by picking up ir- 
relevant contributions from other wavevectors and are reasonable 
choices. 


Interplay of the eight-unit-cell periodic PDW and the four-unit-cell 
induced N(r,E) modulation. To support the Fig. 4f inset, in which 21 
topological defects in the induced Mr) modulation at 2Q tends to be 
found in the vicinity of the locus of mt phase in Ors At) (yellow strings), 

we performed an independent analysis: the distances of the white and 
black dots to the nearest position on the yellow strings are calculated 
and compared with randomly distributed results. Extended Data Fig. 7a 
shows the distance distribution of the total 25 topological defects in 
Fig. 4e. Then we generate randomly distributed 25 ‘topological defects’ 
inside the same FOV and calculate distances to the same yellowstrings, 
and this process has been repeated 100 times. The average result of 
the 100 different configurations is shown in Extended Data Fig. 7b. 


Itis clear that the distribution from the measured Mr) topological 
defects at 2Q is in a smaller range with higher magnitude compared 
with random results. This supports that the topological defects in the 
measured Mr) modulation at 2Q actually show a statistically strong 
tendency to be found near the locus of mt phase in OQ, (r)- 


tt phase winding and possible half-vortex in PDW. In search for half- 
vortices in PDW, we analysed PDW phase Ors (r)inthe vicinity of the 21 
topological defects from the induced NM(r) modulation at 2Q. We 
extracted the values along each contour surrounding the 2Tt topo- 
logical defects from the induced Mr) modulation at 2Q (Extended Data 
Fig. 8a) and plotted an evolution of the PDW phase for each contour in 
Extended Data Fig. 8b. Although no singularities that have a 1 phase 
winding in OQ, (r) are found, indeed PDW phases are changing by tt 
along each contour, indicating the presence of possible half-vortices. 


Data availability 


The data that support the findings of this study are available from the 
corresponding author upon reasonable request. 
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Extended Data Fig. 1| Analysis of the tip isotropy. a, Topography 7(r) within 
40nmx40nmFOV.b, Real part of Fourier transform of T(r). c, Line profile 
Re7(q) along the line in the middle panel, representing nearly equal Bragg peak 
height (difference is less than 7%). 
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Extended Data Fig. 2 | Estimation of the nanoflake tip geometry. 

a, Autocorrelation of A(r). b, Line profile measured from centreinais 
azimuthal-angle averaged. The size of the nanoflake on the tip is estimated 
from the full-width at half-maximum and is around 3.3 nm. 
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Extended Data Fig. 3| Possible process of the Josephson tunnelling. 
Schematic representation of planar Josephson tunnelling in the presence of 
two order parameters (OPs): homogeneous SC and PDW. 
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Extended Data Fig. 4| Preliminary experimental data analysis. transform of A(r) inaandc, respectively, representing early observations of 
a,c, Preliminary A(r) independently measured at 4.2 K on different pieces of 1/8 peaks marked by the red circles. 
nearly optimally doped Bi,Sr,CaCu,O,,;.b, d, The magnitude of Fourier 


Extended Data Fig. 5 | Differential conductance map and its ratio. a, g(r, E=54 meV) map. The eight-unit-cell CDW modulation, that is, the PDW induced Nr) 
modulation at Q, can be seen. b, Z(r, F=54 meV) calculated by Z(r, £) = g(r, £)/g(t, -E). 
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Extended Data Fig. 6 | Cut-off-length dependence of D7 (r)|and|Ag (r)|. The left column shows|Djq (r)|at different cut-offlengths, similarly for the right 
column for|Ag. (r)|. 
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Extended Data Fig. 7 | Distance analysis. a, A count distribution sorted by map from Fig. 4c. b, Average distribution of 100 configurations, within each 
distances between the topological defects in the induced M(r) modulation at configuration 25 points are randomly generated inthe same FOV and distances 
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Chemical transformations determine the structure of a product, and therefore its 
properties, which in turn affect complex macroscopic functions such as the metabolic 
stability of pharmaceuticals or the volatility of perfumes. Therefore, reaction 
selection can influence the success or failure of acandidate molecule to meet a 
functional objective. The coupling of an amine with a carboxylic acid to form an amide 
bond is the most popular chemical reaction used for drug discovery’. However, there 
are many other ways to connect these two common functional groups together. Here 
we show computationally that amines and acids can couple via hundreds of 
hypothetical yet plausible transformations, and we demonstrate experimentally the 
application of a dozen such reactions. To investigate the contribution of chemical 
transformations to properties, we developed a string-based notation and used an 
enumerative combinatorics approach to produce a map of conceivable amine-acid 
coupling transformations, which can be charted using chemoinformatic techniques. 


We find that critical physicochemical parameters of the products, such as partition 
coefficient and polar surface area, vary considerably depending on the 
transformation chosen. Data mining the amine-acid coupling system produced here 
should enable reaction discovery, which we demonstrate by developing an 
esterification reaction found within the mapped space. Complex molecules with 
distinct property profiles can also be discovered within the amine-acid coupling 
system, as we show here via the late-stage diversification of drugs and natural 


products. 


The amide coupling is a robust and popular reaction used frequently 
in chemical synthesis. The transformation couples an amine (1) anda 
carboxylic acid (2) to form an amide (3) (Fig. 1a). Viewed in the context 
of physicochemical properties, the transformation unites a hydrophilic 
basic moiety (1) bearing two hydrogen bond donors, witha hydrophilic 
acidic moiety (2) bearing one hydrogen bond donor and two hydrogen 
bond acceptors, to generate a neutral product, 3. The amide productis 
more lipophilic than the starting reagents, and has one hydrogen bond 
donor and one hydrogen bond acceptor. Chemoinformatic studies have 
linked physicochemical properties to functions as complex as toxicity” 
and even successful market launch’, and so the ability to modulate the 
numbers of hydrogen bond donors, hydrogen bond acceptors, the 
partition coefficient logP, the molecular weight, and other properties 
of amolecule via chemical synthesis is of high importance. Control 
over physicochemical properties using chemical synthesis is typically 
achieved by varying starting materials iteratively or in a combinato- 
rial manner‘, or by varying build-couple-pair reaction sequences to 
introduce skeletal diversity®. We hypothesized that physicochemical 
properties could be varied simply by switching the chemical transfor- 
mation while holding the building blocks constant. In our view, trans- 
formations describes the mapping of atoms and bonds from starting 
materials to products®, and can be described as reactions only when 
accompanied by experimental reaction conditions. We reasoned that 


a map of conceivable transformations would provide opportunities 
in reaction discovery, especially given contemporary developments 
in robotic’ ™ and algorithmic” * techniques for predicting reaction 
conditions, in addition to presenting a strategy for chemical-space 
exploration. 

The amide coupling is used in one quarter of the reactions reported 
insmall-molecule pharmaceutical patents!. Asa result, there is an abun- 
dance of available amine and acid building blocks. We questioned how 
many other transformations exist for the amine-acid coupling pair. 
Considering amine-acid couplings at the transformation level reveals 
opportunities for reaction discovery. For example, instead of coupling 
land 2 to form 3 (Fig. 1a), a decarboxylation could occur to give 4, or 
a deamination could occur to give 5; likewise, atandem decarboxyla- 
tion-deamination could occur to forge a carbon-carbon bond asin6 
(Fig. 1b). Compounds 7-9 are also possible, and the set of compounds 
3-9 collectively reveals that 1 and 2 could couple to form acidic prod- 
ucts, basic products, neutral products and zwitterionic products. We 
used enumerative combinatorics (Extended Data Fig. 1) to create simpli- 
fied molecular-input line-entry system (SMILES) strings for all products 
arising from the coupling of two generic functional groups, A and B. 

Anotation was developed (Fig. 1c) to describe how functional groups 
AorBcancoupleat the atoms of the functional group (A, B[C], or B[O], 
when B is CO,H), or at the a or B carbon atoms. The notation also 
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a The amide coupling 
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¢ Transformation notation 
A B h = hybridization (sp? or sp°) 
2 + he X = reacting atom (A, B[C], B[O], etc.) 


Y = modification (-A, -B, -OH, +H, +H2) 
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Fig.1| Transformation enumeration strategy and notation. a, Ethylamine (1) 
and propanoic acid (2) can couple to form amide 3, but can also couple to form 
79 other products, including 4-9. b, Enumerating all combinations of sp’ or sp® 


b Conceivable amine-acid coupling transformations 
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Me 
HN~ ~Me 
2 oO 
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Me’ 
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hybridization for the 80 coupling patterns yields 320 product substructures. 
c, Anotation system for classifying transformations; see also Extended Data 
Fig. 1. 


oNH, 4 36 O,HEHH 320 transformations 9,279 drugs and natural products (4)-Noscapine 
3NH,"/3CO,H* © alls (©-Quinine 
3NH,"/3CO,H*" ; a 5 = r. ()-Sitagliptin 


320 amine-acid N 
coupling transformations 
es. 


Fig. 2 | Substructure search of 320 amine—acid coupling transformations 
within 9,279 complex molecules from DrugBank. Each line represents the 
appearance ofa product substructure of a transformation ina complex 
molecule, and the colour of the line represents the frequency of occurrence of 
that substructure in that molecule. The dots around the periphery denote 
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DrugA 
Substructure occurrence 
' in each molecule 
1 
“ ma Reaction 1 
1 10+ Reaction 3 
Reaction 2 


which specific transformations appear in complex molecules (+)-noscapine 
(green dots), (-)-quinine (purple dots) and (-)-sitagliptin (blue dots), which 
connect to112, 96 and 55 transformations, respectively. The numbers around 
the periphery can be matched toa full list of transformation notation labels 
found in Extended Data Table1. 
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91% yield NfOH, HFIP 

45% yield 
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b Molecular weight logP Polar surface area 
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Fig. 3 | Experimental exploration of the sp"-sp” amine-acid coupling 
transformation space. a, Products with different property profiles (for 
example, basic, acidic, neutral, lipophilic) can be produced fromthe same 
two starting materials, 10 and 11 (top), by varying the transformation and 
reaction conditions. NfOH, nonafluorobutanesulfonic acid; HFIP, 
hexafluoroisopropanol; Cp*, pentamethylcyclopentadiene;1,10-phen, 


describes how functional groups A, B or both may appear in or be absent 
from the product. The transformation notation is written in the form 
hAXY/"BXY where his the hybridization (2=sp’, 3 =sp’), Xis the reacting 
atom and Y is any additional modification including loss of A or B(-A, 
-B), dehydration (—OH), or reduction (+H, +H2). All combinations of 
sp’-sp’, sp’-sp’, sp’-sp’ and sp>-sp’ transformations from 1, 2 and their 
sp’ variants ethenamine and acrylic acid (Fig. 2) were included, leading 
to 320 product substructures. Four of the transformations produced 
the amides resulting from coupling sp*-sp’, sp’-sp’, sp*-sp’ or sp°>-sp* 
amines and acids, respectively, but the vast majority of the enumerated 
transformations are currently unknown as reactions. By charting the 
amine-acid cross-coupling space, we aim to understand how chemical 
transformations affect physicochemical properties. 

The enumerated SMILES strings were used as inputs to a series of 
chemoinformatic calculations. First, SMILES strings of the products 
were computationally ionized at pH 7.4 (Supplementary Information), 
and then used to calculate a range of physicochemical properties 
(Extended Data Fig. 2). The full set of 320 products spans a range of 
molecular weights from 54.1to 120.2 g mol“, logP=—2.29 to 2.19 units, 
hydrogen bond acceptors from 0 to3, hydrogen bond donors from Oto 
2, polar surface area = 0 to 67.8 A, fraction sp* = 0 to1, number of rotat- 
able bonds from 1 to 4, and a formal charge between -1and 1. Acom- 
posite function of drug-like properties, the quantitative estimate of 


1,10-phenanthroline. b. Kernel density estimation plots show the range of 
accessible molecular weight, partition coefficient (logP) and polar surface area 
by coupling 10 to 11 via various amine-acid coupling transformations. Grey 
lines denote the molecular weight, logP and polar surface area of 12-17. 

The calculations use 13 and 15 in their charged protonation state. 


drug-likeness”, ranged from 0.27 to 0.54. These findings demonstrate 
that the choice of transformation can havea sizeable effect on proper- 
ties. In the context of drug discovery, it may be necessary to decrease 
the number of hydrogen bond donors when optimizing a molecule 
for the ability to cross the blood-brain barrier’, whereas it may be 
necessary to increase the number of hydrogen bond donors to improve 
aqueous solubility”. In this way, transformation mapping can enable 
studies in property optimization. 

The 320 product molecules from the combinatorial enumeration 
were next used as substructures to search 9,279 pharmaceuticals and 
natural products from the DrugBank database”. As can be seen in 
Fig. 2, there is a high degree of connectivity between the products of 
nearly every amine-acid coupling transformation with diverse phar- 
maceuticals and natural products. Each connecting line represents 
the successful identification of an enumerated product substructure 
within a drug, and the colour of the line depicts the frequency that a 
substructure occurs in that molecule. The density of connections in 
this system suggests that nearly every one of the 320 transformations 
depicted on the periphery of Fig. 2 could find use in the synthesis of 
complex molecules. As expected, the simple alkyl chain 6, formed 
by coupling 1 to 2 @NH,““/7CO,H**), occurs frequently as a product 
substructure: 59,432 times among the DrugBank molecules (Extended 
Data Fig. 3). Likewise, decarboxylative transformations to producean 
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20 = ?NH2°4/2CO2H 21 
18 + MeOH, HCl; 10 + NaNOs, HBF 4; 
then Pd(OAc) 2 
61% yield 


H 
N 
NY 
oats 
N NH 


’s us 


OMe 
2NH>4/8CO2H°# 
from NHPI-ester of 23 
Ru(bpy)a(PF ¢)2, CuBr, 
2-(2,6-diOMePh)amino-2-oxoacetic acid 
NEt3, blue LEDs, 87% yield 


OMe 26 
25 ?NHp*4/8CO,HBIO 
NaNOs, HBF,; then 
Cul, AgNOs3, pyridine, MeCN 
81% yield 


29 3NH,4/8CO2HBICHOH 
HATU, DIPEA 
85% yield 


Fig. 4 | Late-stage diversification. Various transformations enable the 
diversification of the complex molecules 18, 22 and 27. We performeda 
virtual enumeration of other complex molecules—shown in Extended Data 
Fig. 9—wherein the full transformation set was enumerated for four complex 
molecule pairings to demonstrate that a wide range of properties can be 
accessed, depending on which transformation is selected. LIHMDS, lithium 


amine bound to an sp*- or sp?-carbon chain (such as 7NH,4/?CO,H* ® 
to produce amines) appear in high frequency (Extended Data Fig. 4). 
Some transformations, such as?NH,“/7CO,H®!, do not appear as sub- 
structures in pharmaceuticals or natural products at all. This finding 
can be rationalized because, in this case, the transformation produces 
a hydroxyl amine ester, which is probably too reactive a functionality 
to persist in any of the complex molecules found in DrugBank. Analys- 
ing the system in the other direction, novel retrosynthetic strategies 
emerge by using amine-acid coupling transformations. For instance, 
(+)-noscapine connects to 112 transformations, (—)-quinine connects to 
96 transformations and (—)-sitagliptin connects to 55 transformations 
(Extended Data Fig. 5), providing strategies for total synthesis. Our 
analysis until this point focused solely on achiral bond connectivities. 
In three-dimensional space, there are many more possible transfor- 
mations, because some transformations produce syn-diastereomers 
whereas others produce anti-diastereomers (Extended Data Fig. 6). 
These chiral coupling transformations sample a substantial assortment 
of three-dimensional shapes (Extended Data Fig. 7). 

To demonstrate our ability to control properties with atom- 
level precision, several transformations were selected and realized 
experimentally (Fig. 3a). We selected the amide coupling as well as 
four known reactions, which maximized the diversity of properties 
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30 3NH24/3COjHBICHH2 31 °NH2“/SCO2H* 
B(CeF5)3, PMHS MeOH, HCI; 
55% yield LDA, TMSCI, NBS; then 28, DIPEA 


50% yield, 5.8:1 d.r. 


bis(trimethylsilyl)amide; HATU, 1-[bis(dimethylamino)methylene]-1H-1,2,3- 
triazolo[4,5-b] pyridinium 3-oxide hexafluorophosphate; DIPEA, N,N- 
diisopropylethylamine; NHPI, N-hydroxyphthalimide; PMHS, 
polymethylhydrosiloxane; LDA, lithium diisopropylamide; TMSCI, 
trimethylsilyl chloride; NBS, N-bromosuccinimide. 


attainable from the coupling of p-toluidine (10) to o-toluic acid (11). 
Using the free amine and acid directly, we executed the amide cou- 
pling @NH,“/7CO,H®*°") under Schotten-Baumann conditions to 
give 12 in 91% yield. A B(C,F;)3-catalysed reductive N-alkylation” was 
used to realize the 7NH,*/?CO,H®“"” transformation, giving amine 13 
in 60% yield. A cyclized benzoxazole (14) was also generated from 12 
under oxidative conditions”. Through activation of the amine as the 
diazonium salt, an ortho-arylation?NH,* “/CO,H® anda corresponding 
decarboxylative variation ?NH,““/?CO,H® ® were achieved via Goofen’s 
conditions” giving 15 or 16 in 64% and 68% isolated yield, respectively. 

The mapping of amine-acid coupling space provides opportunities 
to devise reaction methods, and we discovered one reaction within this 
system. We reasoned that fruitful combinations of reagents, catalysts, 
ligands, activating groups and directing groups could be identified to 
realize hypothetical reactions. Towards this objective, we applied high- 
throughput experimentation techniques to interrogate the coupling 
of amine-acid derivatives using transition metal complexes, ligands 
and additives (Extended Data Fig. 8). We discovered a reaction based 
onthe?NH,*4/°CO,H®! transformation, which generated ester 17 from 
acid 11 and the diazonium salt of 10 under influence of copper(I) iodide, 
silver nitrate and pyridine. This reaction transforms a C-N bond into the 
C-Obond of the ester. The product, 17, isa matched molecular pair to 


the corresponding amide 12, but bears one less hydrogen bond donor. 
Thus, starting from 10 and 11 and simply by varying reaction conditions, 
we could produce the traditional amide (12), as well as closely related 
analogues that are basic (13), acidic (15), neutral and lipophilic (14, 
16 and 17), or neutral and hydrophilic (12). The products we obtained 
experimentally spana substantial portion of the full range of molecular 
weight, logPand polar surface area values achievable from all coupling 
transformations of 10 and 11 (Fig. 3b), showcasing the utility of our 
approach for fine-tuning molecular properties. 

Many complex molecules contain an amine or an acid functional 
group, so we anticipated that the application of diverse amine-acid 
coupling transformations to late-stage diversification would enable 
access to congeners with diverse property profiles. We used chemoin- 
formatics to evaluate late-stage diversification in the amine-acid cou- 
pling system onaseries of complex substrates (Extended Data Fig. 9). 
Examination of the properties of the products reveals that the choice of 
transformation can determine whether an analogue will pass or fail the 
Lipinskirule of five”, leading toa range in desirability score (quantita- 
tive estimate of drug-likeness)” of 0.31 to 0.70 for the couplings of the 
acid-containing antibiotic levofloxacin with 3,5-dichloroaniline, and 
0.29 to 0.61 for the pairing of yohimbine and a-methylbenzylamine 
(Extended Data Fig. 9). To experimentally demonstrate the value of 
the late-stage diversification concept (Fig. 4), enones derived from 
yohimbine (18), sulfadoxin (22) and lithocholic acid benzyl ether (27) 
were used as substrates. Inthe first instance, 18 was esterified and then 
converted to amide 19 in 78% yield by heating with 10 in the presence of 
lithium hexamethyldisilazide. Concurrently, 18 was esterified and then 
B-arylated to produce 20 upon palladium-catalysed Heck-Matsuda 
arylation using the diazonium salt of p-toluidine (10). We determined 
that 18 could be converted to 19, then treated with magnesium ina one- 
pot operation to introduce an additional stereocentre, as in 21. Like- 
wise, 22 and cyclohexane carboxylic acid (23) coupled to form amide 
24, ester 25 by our copper(I)-promoted C-N to C-O?NH,*4/7CO,H®! 
reaction, or amine 26 under decarboxylative conditions”. Finally, 27 
served as a framework to produce amide 29, amine 30 or aminoester 
31 via a one-pot ?7NH,*/?CO,H* a-amination sequence using piperidine 
(28). In this work we focused on amines and acids, but transformations 
of any pair of functional groups can be enumerated to serve as inspira- 
tion for the development of novel reaction methods and as a strategy 
for chemical-space exploration. All of the transformations mapped 
in the amine-acid coupling system could exist, but most are not yet 
linked to viable reaction conditions, making this transformation space 
a fertile proving ground for manual or automated reaction discovery. 
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Extended Data Fig. 1| Workflow for enumeration of amine—acid 
transformations. For a pair of coupling partners, we consider a reaction at the 
functional group A (amine) and B (carboxylic acid oxygen, B[O] or carbon, 
B[C]). Deamination reactions are noted as —A and decarboxylation reactions 
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are noted as —B. Enumeration following steps 1-3 produces 320 
transformations. For the enumeration of all syn- and anti-diastereomers (step 
4), consult also Extended Data Fig. 6. 
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Extended Data Fig. 2| Kernel density estimate plots for 320 conceivable acceptor; HBD, hydrogen bond donor; PSA, polar surface area; FSP3, fraction 
amine—acid coupling transformations. Distribution of common physical sp’; ROTB, rotatable bonds; FC, formal charge; QED, quantitative estimate of 
properties from the achiral amine-acid coupling of ethylamine, ethenamine, drug-likeness. 

propanoicacid and acrylic acid. MW, molecular weight; HBA, hydrogen bond 


Frequency of occurrence of each transformation in complex molecules from DrugBank 
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Extended Data Fig. 3 | Number of DrugBank hits per transformation. This bar chart shows how many times a transformation is found in the DrugBank database. 
Eachnumber on the abscissa maps toa transformation listed in Extended Data Table 1. 
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Extended Data Fig. 4| Decarboxylative transformations from the 
enumeration scheme. Decarboxylative reactions that produce an amine 
bound toansp’ or sp’ carbon chain appear in high frequency. These reactions 
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can be used to synthesize a large number of drugs contained in DrugBank. Each 
transformation can be found by its corresponding number in Extended Data 
Table 1. The colour scale is the same as in Fig. 2. rxn, reaction. 
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Extended Data Fig. 5| Transformations fromtheenumerationschemefound Noscapine connects to112 of the transformations. b, Quinine connects to 96 
in specific drugs. The chord diagrams show connectivity of transformation transformations. c, Sitagliptin connects to 55 transformations. The colour 
substructures as retrosynthetic disconnections in target molecules, with red scale is the same as in Fig. 2. 

and blue dots highlighting the transformations shown at left in each panel.a, 
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Extended Data Fig. 6 | Enumeration of regioisomers and diastereomers. 
a, Thetransformation substructures enumerated in Fig. 3 are from the 320 
achiral bond arrangements available from coupling 1, 2 and their sp’ variants 
ethenamine and acrylic acid. b, To sample three-dimensional and 
regiochemical space, a B’ substituent was added as a differentiating 
substituent. The B’ substituent may be any substituent, but is enumerated as 
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being distinct from the B substituent. Considering this regiochemical 
enumeration increases the 320 achiral coupling transformations to 588.c, 
Subsequent enumeration of all possible diastereomers leads to 1,005 chiral 
coupling transformations. These 1,005 three-dimensional substructures were 
used as inputs in the principle moment of inertia plot in Extended Data Fig. 7. 


1,005 amine-acid coupling transformations 


pe ea 


1.0 


NH> 


3NH2P" / 3CO,HBIO! 


0.9 


Soe 


2NH.A | 3CO,HBICI-OH 


0.8 


3NH2!'—4 / 2CO,H—OH 
0.7 


2n Hy°—A I 2CO,H*-8 


0.6 


0.5 
0.0 0.2 0.4 


Extended Data Fig. 7 | Principal moment of inertia plot of 1,005 amine-acid 
coupling transformations incorporating stereochemistry and 
regiochemistry. In this expanded three-dimensional space, regiochemistry 
and stereochemistry of the transformations were considered. A total of 1,005 
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ways to connect anamine to an acid were found. The products presented a 
diversity of properties and three-dimensional shapes. Each moleculeis 
coloured by its quantitative estimate of drug-likeness. 
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Extended Data Fig. 8 | High-throughput experimentation for the discovery 
ofacopper-promoted esterification reaction. a, Anesterification reaction 
discovered through reaction screening of transition metals with ligands and 
additives. b, Recipe and well mapping. c, Calibration curve, for product 17 
versus caffeine internal standard, used to convert the ultraperformance liquid 
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Extended Data Fig. 9 | Kernel density estimate plots ofa series of complex 
molecules as substrates in the amine-acid coupling system. The amine-acid 
pair depicted was used as an input to combinatorial enumeration, and the 
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number of valid products identified is noted for each pairing. Distributions of 
common physical properties are shown for each coupling set. Abbreviations 
are as in Extended Data Fig. 2. 
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Extended Data Table 1| Transformation labels 


# Transformation label # Transformation label # Transformation label # Transformation label 
1 2NH)4 / 2C0,HBLOl 81 3NH,4 / 2CO,HBIO) 161 2NH, / 3CO, HELO) 241 3NH,4 / 3CO,HBlO) 

2 2NH2° / 2CO,HB!O 82 3NH2" / ?CO,HBO} 162 ?NH2" / 3CO,HBO} 242 3NH2" / 3CO,HBO} 

3 2NH28 / 2CO, HBO!) 83 3NH2® / 2cO, HBO! 163 2NH28 / 3CO, HBO! 243 3NH2® / 3cO, HBO! 

4 2NH)4 / 2CO,HBIO}+H 84 3NH,4 / 2CO,HBlOl+H 164 2NH)4 / 3CO,HBlOl+H 244 3NH)4 / 3CO,HBlOl+H 

5 2NH2% / ?CO;HBIOI+H 85 3NH2° / ?CO,HBIOI+H 165 2NH2% / 3CO,HBIOI+H 245 3NH2% / 3CO,HBIO1+H 

6 2NH,° / 2CO, HBLOI+H 86 3NH,° / 2CO0, HBLOI+H 166 2NH,° / 3CO, HBIOI+H 246 3NH,° / 3CO, HBIOI+H 

7 2NH,A ii 2CO,HBLOl+H2 87 3NH,4 i 2CO, HBO H2 167 2NH,A ii 3CO,HBLO}+H2 247 3NH,4 ji 3CO,HBLO+H2 
8 ?NH,* / 2CO,HB(O}+H2 88 3NH,* / 2CO,HB(O}+H2 168 ?NH,° / 3CO,HB(O}+H2 248 3NH,° / 3CO,HB(O}+H2 
9 2NH,° / ZGOsReuna 89 3NH,° / Agen BORE 169 2NH,° / BGO Hobos 249 3NH,° / Go| AOR 
10 ?NH24 / 2CO,HBICI-OH 90 3NH)“ / 2CO,HBICI-OH 170 ?NH24 / 3CO,HBICI-OH 250 3NH,4 / 3CO,HBICI-O# 
11 ?NH2° / 2CO,HBICI-OH 91 3NH2° / 2CO,HBICI-OH 171 ?NH2° / 3CO,HBICI-OH 251 3NH2° / 3CO,HBICI-OH 
12 2NH,8 / 2¢O, HBICI-OH 92 3NH,8 / 2¢O, HBICI-OH 172 2NH,8 / 30, HBICI-OH 252 3NH,8 / 3CO, HBICI-OH 
13 2NH,* / 2CO,HBICIHH 93 3NH,A / 2CO,HBICI+H 173 2NH,A / 3CO,HBICIHH 253 3NH)A / 3CO,HBICIHH 
14 2NH" / 2CO,HBICI+H 94 3NH, / 2CO,HBICI+H 174 2NH," / 3CO,HBICI+H 254 3NH, / 3CO,HBICI+H 
15 2NH28 / 2CO,HBICI+H 95 3NH2° / 2CO,HBICI+H 175 2NH2° / 3CO,HBICI+H 255 3NH2° / 3CO,HBICI+H 
16 2NH,A / 2CO,HBICI+H2 96 3NH,A / 2CO,HBICI+H2 176 2NH,A / 3CO,HBICI+H2 256 3NH,A / 3CO,HBICI+H2 
17 2NH,° ji 2CO,HBICI+H2 97 3NH,° fi PCO nee 177 2NH,° / 3CO,HBICI+H2 257 3NH,° i 3CO, HBIC+H2 
18 2NH,8 / 2CO, HBICI+H2 98 3NH,8 / 2CO, HBICI+H2 178 2NH,8 / 3CO, HBICI+H2 258 3NH,° / 3CO, HBICI+H2 
19 2NH2* / 2CO,H% 99 3NH,* / 2CO,H® 179 2NH2* / 2CO,H% 259 3NH>* / ?CO,H® 

20 2NH24 / 2CO,H*-OH 100 3NH,4 / 2CO,H*-OH 180 2NH,4 / 3CO,H?-OH 260 3NH,4 / 3CO,H*-OH 

21 ZN bane COsHana 101 3NH,4 / 2cO,Ho*H 181 2NH,4 / 3CO,Ho+H 261 3NH,4 / 3cO,Ho+H 

22 2NH,* / 2CO,HotH2 102 3NH,4 / 2CO,He+H? 182 2NH,* / 3cO,Hath2 262 3NH,* / 3CO,HetH? 
23 ?NH2" / 2CO,H% 103 3NH3% / 2CO,H® 183 ?NH2" / 3CO,H® 263 3NH3" / 3CO,H® 

24 2NH2“ / 2CO,H*~OH 104 3NH2“ / 2CO,H*~0H 184 2NH2“ / 3CO,H*~OH 264 3NH2“ / 3CO,H*~0H 

25 2NH,° / 2CO,HotH 105 eNHESVeconett 185 2NH,° / 2CO,HotH 265 2NHSSVE Co Het 

26 2NH2% / 2CO,HO+H2 106 3NH,° / 2CO,Ho+H2 186 2NH,° / 3CO,HotH2 266 3NH,° / 3CO,HotH2 

27 2NH.° / 2CcO,H® 107 3NH,° / 2CO,H® 187 2NH,° / 3CO,H® 267 3NH,° / 3CO,H® 

28 2NH.8 / 2cO,H9-O 108 3NH,° / 2cO,H9-O8 188 2NH2° / 3CO,H*-OH 268 3NH.° / 3CO,H9-O8 
29 2NH,° / 2CO,H9*H 109 3NH,* / 2cO,Ho*H 189 2NH,° / 3CO,Ho*H 269 3NH,° / 3CO,Ho*H 

30 2NH2° / 2cO,HotH? 110 3NH,° / 2cO,HotH2 190 2NH2° / 3CO,HotH2 270 3NH,° / 3CO,HotH2 

31 2NH,* / 2CO,HE® 111 3NH,* / 2CO,H® 191 2NH,* / 3CO,H® 271 3NH,* / 3CO,H® 

32 2NH,4 / 2CO,HE-O8 112 3NH4 / 2CO,HE-O# 192 2NH,4 / 3CO,HB-O# 272 3NH,4 / 3CO,HB-O# 

33 ?2NH,* / 2CO,HEtH 113 3NH,* / 2CO,HEe+H 193 2NH>" / 2CO,HPtH 273 3NH2* / 3CO,HetH 

34 2NH,* / 2CO,HB+H2 114 3NH,4 / 2CO,HB+H2 194 2NH,* / 3CO,HB+H2 274 3NH,* / 3CO,HB+H2 

35 2NH,° / 2CO,HE 115 3NH,° / 2CO,HE 195 2NH,° / 3CO,HE 275 3NH,° / 3CO,HE 

36 ?NH2" / 7CO,H8- OH 116 3NH2“ / 2CO,HB-OH 196 ?NH2“ / 3CO,HB-O# 276 3NH2° / 3CO,H8-O# 

37 NHS 2C0, He 117 IN," 2C0, ete 197 ?2NH2% / 3CO,HE+H 277 3NH2% / 3CO,HE+H 

38 ?NH2° / 2CO,HB+H2 118 3NH,° / 2CO,HB+H2 198 ?NH2° / 3CO,HB+H2 278 3NH,° / 3CO,HB+H2 

39 2NH,° / 2CO,HP 119 3NH,° / 2cO,HP 199 2NH,° / 3CO,HP 279 3NH,° / 3CO,HP 

40 2NH2° / 2cO,HB-O4 120 3NH,° / 2cO,HB-O4 200 2NH2° / 3CO,HB-O4 280 3NH2° / 3CO,HB-O4 

41 2NH,° / 2CO,HP+H 121 3NH,° / 2CO,HP+H 201 2NH,° / 3CO,HB+H 281 3NH,° / 3CO,HB+H 

42 2NH28 / 2CO,HP+H2 122 3NH,° / 2CO,HP+H2 202 2NH2° / 3CO,HP+H2 282 3NH.° / 3CO,HP+H2 

43 2NH2" / 2CO,H*-8 123 3NH2" / 2CO,H*-8 203 2NH2" / 3CO,H*8 283 3>NH2* / 3CO,H*8 

44 2NH,% / 2CO,H2-8 124 3NH,°% / 2CO,H*-8 204 ?NH,° / 3CO,H*-8 284 3NH,° / 3CO,H*-8 

45 2NH,° / 2cO,H9-8 125 3NH,° / 2cO,H*-8 205 2NH,° / 3CO,H*-8 285 3NH,° / 3CO,H*-8 

46 2NH,* / 2CO,HE-8 126 3NH,* / 2CO,H8-8 206 2NH,* / 3CO,HE-8 286 3NH,* / 3CO,HE-8 

47 2NH2° / 2CO,HE-& 127 3NH,° / 2CO,HE-& 207 2NH2° / 3CO,HE-& 287 3NH2° / 3CO,HE-& 

48 2NH,® / 2cO,HB-8 128 3NH,° / 2cO,HB-8 208 2NH,° / 3CO,HB-8 288 3NH,° / 3cO,HB-8 

49 2NH3%-* / 2co, HEI 129 3NH3%"* / 2c0, HPI! 209 2NH3*"* / 3¢0,H8lel 289 3NH°-* / 3CO, HE 
50 2NH2°-4 / 2c, HELO 130 3NH,-4 / 2c, HELO 210 2NH2°-4 / 3cO, HBO! 290 3NH2°-4 / 3cO, HBL! 
51 2NH A / 20, }HBIO]+H 131 3NH)°A / 2¢0, HBIO]+H 211 2NH A / 30, HBLOI+H 291 3NH,°~A / 30, HBLOI+H 
52 2NH,8-A / 20, HBIOI+H 132 3NH,8-A / 2¢0, HBIOI+H 212 2NH 8-4 / 3CO, HBIOI+H 292 3NH,B-A / 3CO, HBIOI+H 
53 2NH oA / 20, HBIO]+H2 133 3NH A / 20, HBIO]+H2 213 2NH oA / 3, HBIOl+H2 293 3NH 2A / 30, HBIOl+H2 
54 2NH,8-A / 2co, HBLOl+ H2 134 3NH,8-A / 2co, HBLOl+ H2 214 2NH,8-A / 3cO, HBlOl+H2 294 3NH,8-A 7 3cO, HBIOl+ H2 
55 2NH oA / 2, HBICI-OH 135 3NH oA / 2, HBICI-OH 215 2NH oA / 3CO, HBICI-OH 295 3NH oA / 3CO, HBICI-OH 
56 2NH,8-4 / 2¢O, HBICI-OH 136 3NH,P-4/ 2¢O, HBICI-OH 216 2NH,8-4 / 30, HBICI-OH 296 3NH,P-4/ 30, HBICI-OH 
57 2NH,°-4 / Accor jf 137 ENiace uw 2CO,HBICI+H 217 2NH,°-4 / 3CO,HBICI+H 297 3NH{°-4/ 3CO, HBICI+H 
58 2NH,8-A / 2¢0, HBICIHH 138 3NHA8-A / 2¢0, HBICIHH 218 2NH8-A / 3¢0, HBICIHH 298 3NHAS-A / 3¢0, HBICIHH 
59 Ne] Aeos |S e 139 Nn) Aeoy |e 219 AN if opp AEE 299 SNH ame eGOy mpm 
60 2NH,8-4/ 2CO, HBICI+ H2 140 3NH,P-4/ 2CO, HBICI+ H2 220 2NHP-4 / 3CO, HBICI+ H2 300 3NH,P-4/ 3CO, HBICI+ H2 
61 NHS * /?C0,H® 141 3NH°-* 7 2C0,,H4 221 2NH2%"4 / 3CO,H? 301 3NH2%"4 / 3cO,H2 

62 ?NH,°4 / 2CO,H9-OH 142 3NH,°-4 / 2CO,H9-OH 222 ?NH,*4 / 3CO,H9-OH 302 3NH,°-4 / 3CO,H9-OH 
63 2NH oA / 2c0,He+H 143 3NH A / 2c0,He+H 223 2NH oA / 30, H*+H 303 3NH oA / 30, Ha+H 
64 2NH oA / 2¢0, HatH2 144 3NH A / 2¢O, Ha+H2 224 2NH oA / 3CO, Hat H2 304 3NH 2A / 3CO,Ha+H2 
65 2NH,°-4 / 2cO,H% 145 3NH,°-4 / 2cO,H% 225 2NH2°-4 / 3CO,H®% 305 3NH2°-4 / 3CO,H% 

66 2NH26-4 / 2c0, H9-OH 146 3NH-A / 2c0, H9-OH 226 2NH2°-4 / 3CO, H9-OH 306 3NH,P-A / 3c, H9-OH 
67 2NH26-A / 200, HotH 147 3NH,P-A / C0, Ho+H 227 2NH26-A / 3¢0,Ho*4 307 3NH,P-A / 40, Ho+H 
68 2NH°-4 / 2cO,HotH2 148 3NH8-4 / 2cO,Ho+H2 228 2NH2®-4 / 3cO,HotH2 308 3NH-4 / 3cO,HotH2 
69 2NH,°-* / 2cO,HF 149 3NH,°-4 / 2cO, HF 229 2NH,°-* / 3cO,H®F 309 3NH2°~4 / 3cO, HP 

70 2NH,%-4 / 2cO,HB-OH 150 3NH3%-4 / 2cO,HB-O# 230 2NH2%-4 / 3CO,HB-O# 310 3NH3%-4 / 3CO,HB-OH 
71 2NH2""* / 2c0,HPtH 151 3NH3°-* /2co,HPtH 231 2NH2*" 7 3CO, HP 311 INH?" 2co, He 
72 ?NH2°4 / 2cO, He+H2 152 3NH27-4 / 2c0, HE+H2 232 ?NH2*-4 / 3cO, HE+H2 312 3NH2°4 / 3cO, HE+H2 
73 2NH28-4 / 2c0, HP 153 3NH-4 / 2c0, HE 233 2NH2°-4 / 3cO,HE 313 3NH2°-4 / 3co, HP 

74 2NH,®-4 / 2cO, H8-OH 154 3NH,®-4 / 2cO, H8-OH 234 2NH "4 / 3CO, H8-OH 314 3NH,®-4 / 3CcO,H8-OH 
75 2NH2®-4 / 2cO,HB+H 155 3NH2°-4 / 2cO,HB+H 235 ANibeaVaGOs nets 315 3NH2®-4 / 3CO,HB+H 
76 2NH.°-4 / 2cO, HB+H? 156 3NH2PA 7 2co, HE+H? 236 2NH°-4 / 3cO, HB+H2 316 3NH2FA 7 3co, HE+H? 
uw NH, o 8) 2CO He 157 Shoe 237 EN Gm eCOs Home 317 SNe" 7 20, He? 
78 2NH,°-4 / 2cO,H*-8 158 3NH,°-4 / 2c0,H*-8 238 2NH°-4 / 3cO,H*-8 318 3NH,°-4 / 3co,H*-8 
79 2NH2* "A 7 2C0,HP-= 159 3NH2*-A 7 2¢0,HP-= 239 2NH2%A 7 3cO,HP-E 319 3NH3-* 7 3cO,HP-= 
80 2NH2°-4 / 2c0,H8-8 160 3NHF-4 / 2c0,He-8 240 2NH2°-4 / 3cO,HE-8 320 3NH2F-4 / 3co,He-8 


This table maps each transformation number from the periphery of the chord diagram in Fig. 2 to a transformation label. 


Article 


Discovery and characterization of an 
acridine radical photoreductant 


https://doi.org/10.1038/s41586-020-2131-1 


Received: 13 December 2019 


Accepted: 18 February 2020 


Published online: 1 April 2020 


® Check for updates 
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Khadiza Begam”, Andrew M. Moran’, Barry D. Dunietz* & David A. Nicewicz'~ 


Photoinduced electron transfer (PET) isa phenomenon whereby the absorption of 
light by a chemical species provides an energetic driving force for an electron-transfer 
reaction’ *. This mechanism is relevant in many areas of chemistry, including the 
study of natural and artificial photosynthesis, photovoltaics and photosensitive 
materials. In recent years, research in the area of photoredox catalysis has enabled the 
use of PET for the catalytic generation of both neutral and charged organic free- 
radical species. These technologies have enabled previously inaccessible chemical 
transformations and have been widely used in both academic and industrial settings. 
Such reactions are often catalysed by visible-light-absorbing organic molecules or 
transition-metal complexes of ruthenium, iridium, chromium or copper*®. Although 
various closed-shell organic molecules have been shown to behave as competent 
electron-transfer catalysts in photoredox reactions, there are only limited reports of 
PET reactions involving neutral organic radicals as excited-state donors or acceptors. 
This is unsurprising because the lifetimes of doublet excited states of neutral organic 
radicals are typically several orders of magnitude shorter than the singlet lifetimes of 
known transition-metal photoredox catalysts’. Here we document the discovery, 
characterization and reactivity of a neutral acridine radical with a maximum excited- 
state oxidation potential of -3.36 volts versus a saturated calomel electrode, which is 
similarly reducing to elemental lithium, making this radical one of the most potent 
chemical reductants reported”. Spectroscopic, computational and chemical studies 


indicate that the formation of a twisted intramolecular charge-transfer species 
enables the population of higher-energy doublet excited states, leading to the 
observed potent photoreducing behaviour. We demonstrate that this catalytically 
generated PET catalyst facilitates several chemical reactions that typically require 
alkali metal reductants and can be used in other organic transformations that require 
dissolving metal reductants. 


Our laboratory, as well as others, has published numerous examples 
highlighting the diverse reactivity of acridinium salts, such as Mes- 
Acr*BF, (Mes, mesityl; Acr, acridinium), as photooxidation catalysts in 
the excited state (*Mes-Acr’*; Fig. 1a)’. Upon absorption of visible light, 
the corresponding excited state of the acridinium salt is populated 
and may be quenched via electron transfer from an electrochemically 
matched substrate, resulting in the formation of an acridine radical 
(Mes-Acr’; Fig. 1a). In past work using acridinium photoredox catalysts, 
this radical was typically oxidized to regenerate the parent acridin- 
ium and close a catalytic cycle. During previous mechanistic studies 
conducted by our laboratory, it was noted that solutions of Mes-Acr’ 
generated via reduction of Mes-Acr'BF, with cobaltocene were indefi- 
nitely stable under oxygen-free conditions and possessed two major 
absorption features (at 350-400 nm and 450-550 nm; Fig. 1b)". These 
observations led us to explore the photophysical behaviour of this 


radical, with a focus on identifying potential PET behaviour. Previous 
studies have detailed the in situ generation and excitation of stable 
cation and anion radical species and their use in catalytic reactions”, 
indicating the potential feasibility of this strategy and prompting our 
studies of the photophysical behaviour of Mes-Acr’. 

Upon investigation of the excited-state dynamics of Mes-Acr’, we 
found that there are two main excited states, tentatively assigned as a 
lower-energy doublet (D,) and a higher-energy twisted intramolecular 
charge-transfer (TICT) state. The excited-state energy for the dou- 
blet excited state of Mes-Acr’ is estimated by averaging the energies 
of the lowest-energy absorption maximum and the highest-energy 
emission observed upon excitation at 484 nm. The energy of the pro- 
posed higher-order excited state is estimated by averaging the ener- 
gies of the emission maximum near 490 nm and the maximum of the 
corresponding excitation spectrum monitored at this wavelength 
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Fig. 1| Mechanistic studies of Mes-Acr radical. a, Reduction potential of 
various elemental alkali metals compared to the peak reducing potential of 
Mes-Acr’. E*,, excited-state oxidation potential; F*,,, excited-state reduction 
potential; Ey, half-wave reduction potential. b, Absorbance and emission 
(excitation, 400 nm) profiles for Mes-Acr’ in MeCN (5 mM, 1mm pathlength). 
c, SOMO and LUMO + 1 visualizations for Mes-Acr’.d, Transient absorption 


spectra of Mes-Acr’ (2.5 mM, THF, 1mm path length) collected with a 250-fs 


(see Supplementary Information for details of the excited-state energy 
calculations). Estimation of excited-state energies in this fashion gives 
values of 2.31 eV for the energy of the proposed D, excited state and 
2.76 eV for the corresponding higher-energy excited state (Fig. le). 
Using the known electrochemical potential of Mes-Acr’ (ref. 7°), the 
excited-state oxidation potentials of these states were estimated to be 
-2.91V and —3.36 V, respectively, with respect to a saturated calomel 
electrode (vs SCE). To our knowledge, these values represent some of 
the most negative excited-state oxidation potentials reported for an 
organic molecule. 

Before we proceed to discuss the calculated excited-state energies, 
we consider the key orbitals involved in the low-lying excited states. 
We find that the singly occupied molecular orbital (SOMO) density is 
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pump pulse centred at 400 nm.e, Excited-state energies calculated using the 
SRSH-PCM/TD-DFT method for Mes-Acr’ (left) and frontier orbital plot 
showing the donor and acceptor density for the TICT excited state (right). 

f, Debromination reaction of brominated acridinium derivative giving 
circumstantial evidence for the TICT state. Mes, mesityl; DIPEA, 
N,N-diisopropylethylamine; t-Bu, tert-butyl. 


localized on the acridine core, and LUMO +1 (where LUMO is the low- 
est unoccupied molecular orbital) is localized on the N-pheny!l ring 
of Mes-Acr’ (Fig. 1c). On the basis of this observation of small spatial 
overlap between these two orbitals, we expect to find a relatively low- 
lying excitation of an intramolecular charge-transfer state. To further 
probe the excited-state behaviour of Mes-Acr’, we performed transient 
absorption experiments (Fig. 1d). At early pump-probe delay times 
in tetrahydrofuran (THF), the ground state of Mes-Acr’ is bleached 
(change in absorption, AA < O) and excited-state absorbance reso- 
nances (AA > O) with maxima at 550 nm and -650 nm are observed. 
Aromatic radical anions are known to exhibit broad absorbance peaks 
in the 600-800 nm range as do aqueous solvated electrons” *°. The 
observed excited-state absorbance signal at ~550 nm also matches the 
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absorbance profile expected for a general acridine exitonic structure. 
Simple first-order decay to the ground state occurs after ~100 ps, match- 
ing well with previously reported values for excited-state lifetimes of 
organic radicals. Time-dependent density functional theory (TD-DFT) 
calculations indicate that other red-shifted absorptions present are well 
matched with energies calculated for a general acridinium structure. 
These spectral features support the formation of a charge-transfer 
state possessing both aromatic radical anion and acridinium features, 
as expected for the proposed TICT state. 

To better understand the effect of rotation of the N-phenyl ring onthe 
excited-state energetics of the acridine radical, we employ the recently 
reported polarization-consistent TD-DFT-based framework for obtain- 
ing excited-state energies of solvated molecular systems. The approach 
addresses dielectric polarization consistently by invoking the same 
dielectric constant in the screened range-separated hybrid (SRSH) func- 
tional parameters and in the polarizable continuum mode (PCM). SRSH- 
PCM was benchmarked well inthe calculation of charge-transfer state 
energies of solvated donor-acceptor complexes and in the analysis of 
the spectral trends of several pigments with increased accuracy, where 
conventional TD-DFT calculations fail to reproduce the observed trends 
(see Supplementary Information for full computational details)” *. 
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The calculated doublet excited-state energies for Mes-Acr’ agree 
very well (within 0.1 eV) with values determined through spectroscopic 
measurements for both the absorption and emission spectra (Fig. le). 
The calculated lowest-energy D, state, with an excited-state energy of 
2.29 eV, agrees with the experimentally determined D, value of 2.31 eV. 
Additionally, two excited states with substantial charge-transfer charac- 
ter were identified and the corresponding energies were calculated to 
be 2.75 eV and 2.78 eV, matching closely the estimated spectroscopic val- 
ues for the proposed TICT state energy of 2.76 eV. As such, the identified 
D, (2.29 eV) state is assigned as an untwisted exitonic state, whereas the 
calculated 2.78 eV state is assigned as a TICT state. These excited-state 
energies also correspond well with previously reported excited-state 
energies for neutral radical species’. Additionally, visualizations of the 
geometries of the corresponding TICT state indicate sizeable rotation 
of the N-phenyl ring (36°) relative to the more planarized geometry 
of the D, state, providing further evidence of the profound effects of 
N-phenyl rotation on excited-state energy. 

With the electronic and excited-state behaviour of Mes-Acr’ eluci- 
dated, we sought to use this species as a catalytic reductant in a pho- 
toredox manifold. Previous work in reductive photoredox catalysis 
has established the reduction of aryl halides as acommon benchmark 
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reaction>**”°*°, Furthermore, the extremely potent reducing behaviour 
of the acridine radical should enable the reduction of a wide range 
of electronically diverse substrates. Diisopropylamine (DIPEA) was 
identified asa suitable single-electron reductant for the generation of 
Mes-Acr’ from Mes-Acr*BF, in situ. Following excitation, Mes-Acr*BF,~ 
undergoes single-electron reduction via electron transfer from DIPEA, 
generating the desired radical Mes-Acr’. To chemically probe the pos- 
sibility of charge transfer to the N-phenyl ring, brominated acridinium 
(1) was prepared. In the presence of 3 equiv. DIPEA, 1 was completely 
converted toa mixture of debrominated acridinum (1a) and hydroacri- 
dine (1b) following irradiation at 390 nm for 18 h (Fig. If). As aryl halide 
radical anions are known to quickly fragment to yield the corresponding 
aryl radicals, this experiment is indicative of the formation of radical 
anion character localized on the N-phenyl ring during excitation. 

To evaluate the competency of this radical species as a catalytic 
reductant, conditions for the reductive dehalogenation of aryl halides 
were developed (Fig. 2a). A variety of both electron-rich (6-13) and 
electron-poor (14, 15) aryl bromides afforded the desired hydrodebro- 
minated products in good to excellent yields (nuclear magnetic reso- 
nance, NMR, yields of products were taken using hexamethyldisiloxane 
as an internal standard). It is of note that reductively recalcitrant aryl 
chlorides also participated efficiently in this reaction, in contrast to 
previously reported methods that are only effective for electron-poor 
(under visible-light irradiation) or moderately electron-rich aryl chlo- 
rides (under UVA irradiation)” ®. A variety of both electron-donating 
(16-20) and electron-withdrawing (21-24) substituents were tolerated, 
with only slightly reduced yields in the case of electron-poor substrates. 


Substrates bearing ketone (30), carboxylic acid (31) and alcohol (28) 
functionalities all afforded the desired hydrodechlorinated products 
in good to excellent yield. Medicinally relevant pyridine (25, 26) and 
aryl carbamate (27) derivatives were also efficient substrates for this 
transformation. When substrate (23), which bears a trifluoromethyl 
substituent, was subjected to the reaction conditions, partial hydro- 
defluorination (5%) that yielded the corresponding difluoromethyl 
derivative was observed in addition to hydrodechlorination. In all 
other examples, no Birch-type products resulting from overreduc- 
tion were detected. The bis-reduction of polyhalogenated compounds 
(9a) and (9b) gave the corresponding bis-hydrodebromination (9) 
and bis-hydrodechlorination products in 58% and 46% yield, respec- 
tively. For compound (9b), 49% yield of the product resulting from 
mono-hydrodechlorination (9c) was observed in addition to the fully 
dechlorinated product. 

On the basis of prior work in reductive dehalogenation, the fol- 
lowing mechanism is proposed (Fig. 2b). Following excitation, Mes- 
Acr'BF, engages in single electron transfer with the tertiary amine 
reductant DIPEA, generating Mes-Acr’ and the corresponding amine 
cation radical. Mes-Acr’ is then excited by 390-nm light, populating a 
combination of highly reducing D,/TICT excited states, and undergoes 
electron transfer with an electronically matched aryl halide, gener- 
ating an arene radical anion and reforming Mes-Acr*BF,. The arene 
radical anion then fragments, yielding an aryl radical. The resulting 
aryl radical abstracts a hydrogen atom from the amine cation radical, 
yielding the desired product as well as the corresponding iminium salt. 
Deuterium-labelling studies confirmed the amine cation radical as the 
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primary source of hydrogen atoms in this system (see Supplementary 
Information section 8). 

The reductive detosylation of amines was identified as another 
possible transformation, which may be facilitated by Mes-Acr’ (Fig. 3). 
Typically, strong-acid, dissolving-metal (Li/Mg) or low-valent transi- 
tion-metal reductions are employed in detosylation reactions** **. 
A variety of electronically diverse tosylated aniline derivatives were 
smoothly converted to the desired free anilines in moderate-to-excel- 
lent yield. Interestingly, substrates containing aryl halides were toler- 
ated under the reaction conditions. As this reaction is conducted at 
a much lower concentration of substrate compared to the reductive 
dehalogenation method (0.1M versus 0.5 M), the observed lack of aryl 
halide reduction may be a function of concentration. Esters (43, 73), 
free carboxylic acids (44), ketones (48) and free alcohols (58) were 
tolerated under the reaction conditions, showing the high functional- 
group tolerance of this method relative to methods relying on harsh 
dissolving-metal conditions. Benzylic (52) and secondary alkyl amines 
(45, 53, 65-68) were efficient substrates for this transformation as 
well. Medicinally relevant heterocycles—including pyridines (59), 
indoles (58), pyrroles (62), pyrrolidines (67), indazoles (63), benzi- 
midazoles (64) and morpholines (65)—were deprotected in good-to- 
excellent yields, with no reduction of the aromatic system observed 
in all cases. Of note is the ability of this method to chemoselectively 
and efficiently deprotect tosyl amines over mesyl-protected amines, 
as shown by the reaction of substrate 51, yielding the desired deto- 
sylation product in 61% yield with no observed cleavage of the mesyl 
amine. Additionally, the reaction performed well with 1.28 g of starting 
tosylamine, with substrate 64 giving 92% yield when the desired deto- 
sylation was conducted in a standard round-bottom flask irradiated 
with light-emitting diode lamps (see Supplementary Information for 
experimental details). 

In conclusion, an acridine radical generated in situ from single- 
electron reduction of an acridinium derivative may act as a potent 
single-electron reductant upon excitation with 390-nm light. Spectro- 
scopic and computational investigations indicate the formation of at 
least two distinct excited states, one of which may be characterized as 
aTICT state. The development of chemoselective dehalogenation and 
desulfonylation reactions using Mes-Acr’ complement the well known 
oxidative chemistry associated with acridinium salts and highlight 
the potential for the development of other types of reaction based on 
excitation of organic radicals. 
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The mid-Cretaceous period was one of the warmest intervals of the past 

140 million years’, driven by atmospheric carbon dioxide levels of around 

1,000 parts per million by volume’. In the near absence of proximal geological records 
from south of the Antarctic Circle, it is disputed whether polar ice could exist under 
such environmental conditions. Here we use a sedimentary sequence recovered from 
the West Antarctic shelf—the southernmost Cretaceous record reported so far—and 
show that a temperate lowland rainforest environment existed at a palaeolatitude of 
about 82° S during the Turonian-Santonian age (92 to 83 million years ago). This 
record contains an intact 3-metre-long network of in situ fossil roots embedded ina 
mudstone matrix containing diverse pollen and spores. A climate model simulation 
shows that the reconstructed temperate climate at this high latitude requires a 
combination of both atmospheric carbon dioxide concentrations of 1,120-1,680 parts 
per million by volume and a vegetated land surface without major Antarctic 
glaciation, highlighting the important cooling effect exerted by ice albedo under high 
levels of atmospheric carbon dioxide. 


The Cretaceous Period (144-66 million years ago (Ma)) hosted some 
of the warmest intervals in Earth’s history’ °, particularly during the 
Turonian to Santonian stages (93.9-83.6 Ma)*». At that time, atmos- 
pheric carbon dioxide (CO,) concentrations were reconstructed to be 
around 1,000 parts per million by volume (ppmv; ref. °), and average 
annual low-latitude sea surface temperatures probably reached ~35 °C 
(ref.*), with only a minor bihemispheric temperature gradient extend- 
ing polewards from palaeolatitudes between 50 and 60°N (refs. 7°). 
Only small to medium-sized ice sheets may have existed" and global 
sea level was up to 170 m higher than at present”. 

Records documenting the Antarctic terrestrial environment dur- 
ing this mid-Cretaceous warmth are sparse*” ” and particularly rare 
south of the palaeo-Antarctic Circle"*. Such records, however, are 
critical to constrain state-of-the-art Late Cretaceous climate models® 
for predicting the magnitude of atmospheric CO, concentrations 
and their effectiveness in inhibiting the build-up of major ice sheets”. 

Here we reconstruct mid-Cretaceous terrestrial environ- 
mental conditions in West Antarctica by combining micro- and 


macropalaeontological, sedimentological, inorganic and organic geo- 
chemical, mineralogical and palaeomagnetic data, as well as X-ray com- 
puted tomography (CT) imagery, obtained from drill cores recovered 
froma site within the Pine Island cross-shelf trough in the Amundsen 
Sea Embayment (ASE), West Antarctica (Fig. 1a). Site PS104_20-2 
(73.57° S, 107.09° W; 946 m water depth) was drilled during RV Polarst- 
ern expedition PS104 in 2017 (ref. °). The Pine Island Trough extends 
fromthe modern fronts of the Pine Island and Thwaites glaciers, and was 
eroded into the ASE shelf during repeated advances of a West Antarctic 
Ice Sheet palaeo-ice stream throughout the Miocene to Pleistocene 
epochs” **, On the inner to middle continental shelf, glacial erosion 
combined with tectonic uplift” exposed seaward-dipping sedimentary 
strata of postulated Cretaceous to Miocene age near the seafloor” 
(Fig. 1b). Widespread till cover on the shelf previously prevented 
sampling of these strata using conventional coring techniques”. 
Deployment of the remotely operated seafloor drill rig MARUM- 
MeBo70 (ref. 7) enabled drilling to 30.7 m below seafloor (mbsf) into 
the seabed to recover the dipping strata”° (Figs. land 2). 
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Fig. 1| Setting of MARUM-MeBo70 drill site PS104_20-2 onthe ASE shelf. 

a, The present-day location of West Antarctica is shown in relation tothe 
reconstructed boundary between continental and oceanic crust at 84 Ma 
(refs. >!) (thick black lines). The pre-break up suture (dashed white line) 
indicates the position of the reconstructed Zealandian and West Antarctic 
continental and oceanic crust before initial break-up starting at ~90 Ma (ref. *). 
Orange circles mark the locations of other outcrops of mid-Cretaceous 
sedimentary strata”. b, Seismic reflection profile NBP9902-11” (A-B) 


Lithology and stratigraphy 

Beneath a few metres of glacimarine and reworked glacial sediments, 
MARUM-MeBo70 penetrated an occasionally stratified but microfossil- 
barren ~-17-24-m-thick quartzitic gravelly sandstone with uranium-lead 
(U-Pb) dates on apatite and zircon grains (see Methods) constraining 
its maximum depositional age to ~40 Myr inthe late Eocene (Extended 
Data Fig. 1). Cores 9R and 10R recovered strata from 26.3 mbsf to the 
base of the hole. At ~26.8 mbsf, a prominent thin (5 cm) layer of indu- 
rated lignite fragments separates the overlying sandstone unit froma 
>3-m-thick, palynomorph-rich, laminated to stratified carbonaceous 
mudstone below. This mudstone contains an intact and continuous 
network of fossil plant roots that reaches down to at least 30 mbsf 
(Fig. 2; Supplementary Video 1). 

Based on New Zealand's biostratigraphic ranges”*, the presence of 
the pollen taxon Phyllocladidites mawsonii (nearest living relative 
(NLR): Lagarostrobos, Huon Pine) and the absence of both Nothofa- 
gidites (NLR: Nothofagus, Southern Beech) and Forcipites sabulosus 
within the carbonaceous mudstone indicate its deposition during the 
mid-Cretaceous (Turonian-Santonian; ~92-83 Ma, PM1a-subzone) 
(Extended Data Fig. 2; Extended Data Tables 1 and 2). Abundant pollen 
of conifer trees (for example, Podocarpidites, Trichotomosulcites) and 
tree ferns (Cyathidites) and the presence of accessory taxa such as 
Ruffordiaspora ludbrookiae and Tricolpites spp. in our assemblage 
resemble the uppermost strata of the Turonian-Santonian Tupuangi 
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crossing drill site PS104_20-2. The orange bar indicates the drilled core length. 
The profile position is indicated ina. The drill hole penetrated Amundsen Sea 
shelf unconformity ASS-ul1, which separates seismic units ASS-1and ASS-2 
(ref.”°). The interpretation of seismostratigraphic units and unconformities is 
based on both previous work” and this study. Pitt Island belongs to the 
Chatham Island group of New Zealand. PB, Prydz Bay; ChR, Chatham Rise. Shelf 
bathymetry and sub-ice topography data derive from refs. **>. 


Formation on Pitt Island, New Zealand, dated to 92-89 Ma (refs. 777°) 
(Extended Data Table 3). However, the regular occurrence of pollen 
of the family Proteaceae, including Beauprea-type pollen (for exam- 
ple, Peninsulapollis gilii, Beaupreaidites), which are absent from the 
Tupuangi Formation, suggest that the ASE core is slightly younger 
than 89 Myr old. Recent molecular phylogenetic reconstructions indi- 
cate an early Antarctic-Southeastern Australian origin of Beauprea 
(~88 Ma), whereas the oldest palynological record of these angiosperm 
fossils on Antarctica and Australia date back to 81.4 Ma and 83.8 Ma, 
respectively”. 

These biostratigraphic age estimates are consistent with palaeomag- 
netic data obtained from discrete sediment samples showing normal 
polarity, expected for deposition during the ‘Cretaceous Normal Polar- 
ity Superchron’ (C34n; 121-83 Ma; ref. 2°) (see Methods). The layer of 
indurated lignite and the underlying carbonaceous mudstone show 
very similar pollen assemblages, which indicate a similar age and 
palaeoenvironment for both units (Fig. 2; Extended Data Fig. 2). 


Turonian-Santonian position of the record 

To assess the palaeoclimatic importance of this record, we determined 
the palaeogeographical position of site PS104_20-2 at 90 Ma. Today, 
the site is located near the Pacific continental margin of West Antarc- 
tica, about 250 km away from the modern boundary between conti- 
nental and oceanic crust (Fig. 1). At the time of sediment deposition, 
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between 93 and 83 Ma, the continent of Zealandia started to rift and 
separate from West Antarctica™’. We applied a relative plate recon- 
struction between Zealandia and West Antarctica for the middle 
Cretaceous using theGPlates (version 2.2) platereconstructiontool* with 
up-to-date rotation parameters of the South Pacific realm™. This 
resulted ina 736 km great-circle distance (265 km north-south distance) 
between the drill site and the hitherto southernmost mid-Cretaceous 
terrestrial palaeoenvironmental record on Pitt Island on Chatham 
Rise, New Zealand”. The close-fit reconstruction at 90 Ma indicates a 


Fig. 2| Multi-proxy parameter reconstruction of cores 9R and 10Rat site 
PS104_20-2. The MARUM-MeBo70 seafloor drill rig drilled 30.7 m into the 
seafloor and recovered 5.91 m of core length. The lower -3 mconsists of a fossil 
root-bearing mudstone with an ~5-cm-thick layer of brecciated lignite ontop 
(from -26.77 mbsf downwards), both of Turonian-Santonian age. A Late Eocene 
or younger quartzitic gravelly sandstone overlies the lignite. The upper lignite 
boundary defines the impedance contrast between the underlying mudstone 
and overlying gravelly sandstone and probably coincides with the prominent 
regional unconformity ASS-ul” (see the thick red line in Fig. 1b). Note the core 
break between 9R and 10R at 27.15 mbsf. LS, linescan; CT, X-ray computed 
tomography; Cl/St/Sd, clay/silt/sand (n=6); TOC, total organic carbon (n=16); 
Gy/An/Pt/Br, gymnosperms/angiosperms/pteridophytes/bryophytes (n=7); 
X, barren palynomorph samples (n= 9); Halite (n=16); Bulk sediment 
neodymium (€y,q) values (+2 s.d. = 0.27) and strontium (°’Sr/*°Sr) ratios (+2s.e.; 
see Source Data) (n=7) (median centre values) (see Methods); Fe(Ca), iron 
carbonate (n=16); TAR, ratio of terrestrial and aquatic-sourced n-alkanes 
(n=14); C:N (mol), molar ratio of TOC/TN (n=16). *Zircon U-Pb age (45.5 million 
years (Myr)) (n=1). Inferred ages are based on palynomorph biostratigraphy for 
the mudstone and U-Pb ages of apatite and zircon grains for the sandstone (see 
the main text). 


wide rift zone between Zealandia and West Antarctica, just before the 
initiation of continental break-up”**". In a previous study™, a mean pal- 
aeomagnetic pole position at 100 Ma of 75.7°S and 135.9° W with a 95% 
confidence radius of 3.8° for Marie Byrd Land was determined from 19 
rock sample sites. By accounting for the great-circle distance of 7.84° 
to our drill site and rotating points on the East Antarctic polar wander 
path** into the Marie Byrd Land reference frame, we derive a core site 
palaeolatitude of 81.9° S at 90 Ma. The uncertainty in this position is 
estimated to be not larger than the maximum 95% confidence radius 
of 5.9° of the respective part of the polar wander path**. 


Palaeoenvironment 

The indurated lignite layer as well as the laminated to stratified car- 
bonaceous mudstone comprising the fossil plant roots in cores 1OR 
and lower 9R at site PS104_20-2 contain a highly diverse and entirely 
terrestrial palynomorph assemblage of more than 62 pollen and spore 
taxa (Fig. 2; Extended Data Figs. 2, 3; Extended Data Table 3). The 
absence of palynomorphs with different stratigraphic ranges or vary- 
ing thermal maturity suggests that this purely terrestrial microfossil 
assemblage has not been reworked. The assemblage is dominated by 
pollen of the conifer tree families Podocarpaceae and Araucariaceae 
with abundant ferns, including the tree ferns Cyathea, documenting 
the initial stages of an austral temperate rainforest (Fig. 2; Extended 
Data Fig. 2; Extended Data Table 2). The presence of the heterocyst 
glycolipids HG;, triol and keto-diol (Extended Data Fig. 4; see Meth- 
ods) also indicates that benthic cyanobacterial mats colonized fresh 
water bodies within this temperate rainforest, providing additional 
evidence for the development of a highly complex ecosystem in the 
ASE during the Turonian-Santonian. In combination with published 
palaeo-topographic and palaeo-tectonic information””**"”, the 
different taxa and their bioclimatic importance (see Methods) were 
combined and visualized to create Fig. 3. Members of the Proteaceae 
family presumably formed a flowering shrub understorey in the tall 
Late Cretaceous conifer rainforest of the ASE depicted in Fig. 3. The 
lignite layer is rich in spores of Stereisporites antiquasporites (NLR: 
Bryophyte, Sphagnum), which further suggest the temporary exist- 
ence ofa peat swamp in the diverse temperate lowland rainforest. This 
coincides with increasing Peninsulapollis pollen, indicating increas- 
ing humidity® towards the record’s top. Thin sections were carefully 
prepared from resin-impregnated core samples selected from cores 
OR and 10R (see Methods) to characterize the fossil roots. Although 
cell structures were not sufficiently preserved for identification of the 
plant that grew the roots, the presence of parenchyma cells within the 
long and continuous roots makes it likely that the network comprises 


Nature | Vol580 | 2April2020 | 83 


Article 


Fig. 3 | Reconstruction of the West Antarctic Turonian-Santonian 
temperate rainforest. The painting is based on palaeofloral and 
environmental information inferred from palynological, geochemical, 
sedimentological and organic biomarker data obtained from cores 9Rand10R 


vascular plant remains and thus confirms active plant growth at our 
site (Extended Data Fig. 5b-e). Furthermore, the alignment of organic 
and clastic material within the laminated to stratified mudstone matrix 
(Extended Data Fig. 5a) suggests synchronous deposition of clastic 
particles and organic fragments. 

Our environmental reconstruction is further supported by geo- 
chemical and biomarker data. In the mudstone between 29.80 and 
27.03 mbsf and the indurated lignite interval (26.83-26.77 mbsf), zero 
to very low halite and carbonate contents in the bulk sediment frac- 
tion combined with low total organic carbon/total nitrogen (TOC/TN) 
ratios and lowratios of land-plant-derived long-chain n-alkanes versus 
aquatic-sourced short-chain n-alkanes (TAR), point to swampy aquatic 
freshwater conditions (Fig. 2). This interpretation is supported by the 
identification of cells that closely resemble aerenchyma (Extended 
Data Fig. 5d), which is usually responsible for intercellular gas exchange 
under (semi-) permanent subaquatic growing conditions”. In mud- 
stone samples taken from the core segment that contains a particularly 
dense root network (27.03-26.83 mbsf), pollen and biomarkers indicate 
the establishment of terrestrial forest-type vegetation, while elevated 
pristane/n-C,, and pristane/phytane ratios point to a high abundance 
of terrigenous plant material (Extended Data Fig. 6; see also refs. *”**), 
which is in line with the pollen-based interpretation of a terrestrial 
rainforest environment. TOC/TN ratios >20 (Fig. 2) are consistent with 
this interpretation and indicate a primarily land-plant source of organic 
matter” within this mudstone sequence. 

The clay mineral assemblage in cores 9R and 10R is dominated by 
kaolinite (67-72%) and smectite (26-29%), both indicating chemical 
weathering activity under humid and (sub-) tropical climate condi- 
tions*®. However, as this is not corroborated by our reconstructed 
climatic setting, we attribute kaolinite formation in the mudstone to 
the repeated establishment of swampy conditions, in which organic 
acids altered silicate minerals to kaolinite (‘Moorverwitterung’)*. 

The lithological successions in cores 9R and 10R resemble the 
uppermost strata of the Turonian-Santonian Tupuangi Formation 
on Pitt Island, New Zealand”. The Pitt Island strata are characterized 
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at site PS104_20-2. The creation of the painting was further complemented by 
published palaeotopographic and palaeotectonic information?***"”, Original 
size of painting: 83.8 cm x 41.5 cm. Alfred-Wegener-Institut/J. McKay; this 
image is available under Creative Commons licence CC-BY 4.0. 


by interbedded carbonaceous siltstone, quartzofeldspathic sand- 
stone, lignite and/or peat layers. Similar to the sediment sequence 
described for the ASE, the Tupuangi Formation records a terrestrial, 
densely vegetated and partly swampy fluviodeltaic environment”. 
At around 90 Ma, the Tupuangi Formation was located in one of the 
rift basins developing before Zealandia separated from West Antarc- 
tica’*!, -736 km from Site PS104_20-2 (Fig. 1). A diverse conifer forest 
surrounded by extensive river systems*”*? seems to have covered both 
the Zealandian™ and the West Antarctic conjugate continental margin 
during this early break-up phase. 

The sharp lithological change from the root-bearing fossiliferous 
mudstone with the thin layer of indurated lignite on top into the sand- 
stone at 26.77 mbsf is marked by increased iron carbonate and halite 
contents and decreased TOC/TN and TAR ratios within the sandstone 
(Fig. 2), suggesting an estuarine and coastal environment. The maxi- 
mum U-Pb dates of -40 Ma obtained from the sandstone (see Extended 
Data Fig. 1), whichis coarse-grained at its base, indicate a considerable 
hiatus between the mudstone (including the lignite) and the sandstone. 
Sucha hiatus is consistent with neodymium (Nd) and strontium (Sr) 
isotope data, which reflect both a change in sediment provenance 
and a decrease in weathering intensity between the two lithologies 
(Fig. 2; see Methods). The time window of the hiatus coincides with slow 
erosion rates of a tectonically quiescent passive margin”**, whereas 
Eocene-Oligocene tectonic activity of the West Antarctic Rift System 
might have triggered renewed sedimentation of dominantly clastic 
material***’, 


Palaeoclimate 

Multi-proxy evidence from our mid-Cretaceous sedimentary record 
reveals an environment at a palaeolatitude of ~82° S on the Antarc- 
tic continental margin that was characterized by a regional temper- 
ate climate warm enough to maintain a diverse temperate rainforest 
(Fig. 3) only ~900 km from the palaeo-South Pole. Our palynomorph- 
based climate reconstruction following the approach outlined in 
ref, “returns amean annual temperature of 13 °C with precipitation of 


Fig. 4 |Modern and mid-Cretaceous CO, sensitivity runs. a-f, Distribution of 
warmest mean month temperatures (WMMT, colour scale) for present-day 
(a-c) and mid-Cretaceous at 90 Ma (d-f) configurations under atmospheric 
CO, levels of 280, 560 and1,120 ppmv (representing 1x, 2x and 4x PICO,). 

The black triangle indicates the approximate position of site PS104_20-2. 

g, Modelled mid-Cretaceous WMMT (dashed lines) and zonal mean 
temperatures (solid lines) for different atmospheric CO, concentrations. The 


around 1,120 mm yr“. The average temperature of the warmest summer 
month was 18.5 °C. Previous quantitative climate analyses from Antarc- 
tic records ~2,500 km further north resulted in Coniacian-Santonian 
(-89-84 Ma) mean annualair temperatures of 15-21 °C (refs. *”*8), sug- 
gesting a shallow temperature gradient towards our site. Estimates of 
the Late Cretaceous climate based on NLRs generally agree well with 
other temperature proxies*’. However, the approach assumes similarity 
of climate requirements for fossil taxa and their NLRs. Withincreasing 
age, the phylogenetic relationships of a fossil taxon become more dispa- 
rate andthe assumption thus becomes less robust. We therefore applied 
anindependent geochemical palaeothermometer (HTI,,) based on the 
distribution of the heterocyst glycolipids (ref. *°), which corroborated 
our bioclimatic reconstructions by indicating austral summer lake or 
river-surface water temperatures of ~20 °C for the swampy rainforest 
(Extended Data Fig. 4; see Methods). Our record contains, to our knowl- 
edge, the hitherto southernmost evidence of Cretaceous terrestrial 
environmental conditions and reveals a mid-Cretaceous ‘greenhouse 
climate’ that was capable of maintaining temperate conditions much 
farther south than previously documented”. 


Palaeoclimate modelling 

In light of extremely limited mid-Cretaceous CO, proxy data® and widely 
scattered existing data estimates’, and to identify some of the pivotal 
driving mechanism of high-latitude mid-Cretaceous environmental 
conditions reconstructed for our new record, we ran the global cli- 
mate model COSMOS? in a coupled atmosphere-ocean configura- 
tion with fixed vegetation. We did so under present-day (Fig. 4a—c) 
and mid-Cretaceous configurations at 90 Ma (Fig. 4d—g) for 1x, 2x, 4x 
and 6x pre-industrial (PI) CO, levels of 280 ppmv (280, 560, 1,120 and 
1,680 ppmv, respectively; see Methods). Although the model predicts a 
mid-Cretaceous climate in West Antarctica that is already warmer under 
PICO, levels (Fig. 4d), summer surface air and water temperatures of 
~20 °C at ~82°S can only be reproduced by forcing the climate with very 
high atmospheric CO, levels between 1,120 and 1,680 ppmv (Fig. 4f, g). 
Our reconstructed mean annual air temperature of 13 °C, however, still 
remains strongly underestimated by the model (Fig. 4g). 
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temperature estimates (data points), including their respective calibration 
error (20), were derived from the following proxies referred to in ref. *: 
terrestrial 60 of vertebrate tooth enamel and/or pedogenic carbonate 
(filled squares), palaeobotanical data (filled circles), fish enamel 50 (open 
triangles), marine calcareous fossil 5'°O (open diamonds) and biomarkers 
(cross). Temperature estimates from this study are indicated by ared filled 
circle (palaeobotany) and red cross (HG palaeothermometry). 


We conclude that a temperate climate at sucha high latitude with 
more than four months of complete polar night darkness requires 
a combination of both strongly elevated atmospheric CO, concen- 
trations and dense surface vegetation that generates a low plan- 
etary albedo with an associated high radiant energy absorption and 
pronounced seasonality. This largely precludes the existence” of 
any substantial ice-sheet and sea-ice cover in and around Antarctica 
during the Turonian to Santonian stages of the Late Cretaceous 
epoch, an interpretation supported by palaeogeographic recon- 
structions of that period*°. Conversely, the present Antarctic Ice 
Sheet and its associated climate feedbacks, such as ice albedo, 
would providea stabilizing cooling effect in a future high-CO, world 
(Fig. 4a-c). 

To further elaborate on the importance of additional forcing mech- 
anisms, to discover the interdependency of surface vegetation and 
temperature sensitivity in more detail and to explore the drivers of the 
paradox in the late Cretaceous latitudinal temperature gradient visible 
in Fig. 4, future work should aim to run the model with various types of 
vegetation cover coupled with other drivers suchas palaeogeography” 
or changes in cloudiness™. 

Our findings highlight the importance of including land-ice 
changes in long-term climate simulations to accurately estimate 
climate sensitivity on these extended timescales”. We provide key 
data for constraining the response of polar terrestrial ecosystems 
to very high atmospheric CO, concentrations and for assessing the 
impact of Antarctic ice sheet presence under high-CO, scenarios— 
both of which are essential for modelling past and future climate 
change®. 
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Methods 


Seafloor drill rig MARUM-MeBo70 

MARUM-MeBo70 is a robotic drill rig that was deployed on the sea- 
bed and remotely controlled from RV Polarstern during expedition 
PS104”°. Detailed information about the drill rig and its operation is 
published in ref. *. 


X-ray CT 

Whole rounds of MeBo core PS104_20-2 were scanned by a Toshiba 
Aquilion 64 computer tomograph at the hospital Klinikum Bremen- 
Mitte, with an X-ray source voltage of 120 kV anda current of 600 mA. 
The CT scans have a resolution of 0.351 mm inx andy directions and 
0.5 mm resolution in the z direction (resolution of scaled recon- 
struction: 0.195 x 0.195 x 0.3 mm’). Images were reconstructed using 
Toshiba’s patented helical cone beam reconstruction technique. The 
obtained CT data were processed using the ZIB edition of the Amira 
software (version 2017.39)°*°. Within Amira, the CT scans of the core 
sections were merged when necessary and core liners, including 
about 2 mm of the core rims, were removed from the dataset until 
all marginal artefacts from the coring process were removed. Sub- 
sequently, all clasts larger than ~1 mm, root-traces (where present) 
and matrix sediment were segmented with the (marker-based) water- 
shed tool of the Segmentation Editor. Markers were predominantly 
set with the threshold tool. For only rarely occurring clasts with an 
X-ray attenuation close to the matrix sediment, the magix wand tool 
was used to manually set additional markers. Holes within clasts 
after the watershed segmentation were added to the clasts with the 
selection fill tool. 


Palynology 

Between 2 and 6g of dry-weight sediment per sample were processed 
at Northumbria University following standard palynological tech- 
niques, including sieving (10 ppm) and acid treatment with 10% HCl 
(hydrochloric acid) and cold 38% HF (hydrofluoric acid). The processed 
residue was transferred to microscope slides using glycerine jelly asa 
mounting medium, and 2-3 slides were analysed per sample at 400x 
magnification. Of the 17 samples analysed for pollen and spores, 7 were 
productive, and total counts range from 340 to 360 pollen grains and 
spores per sample (Extended Data Figs. 2, 3; Extended Data Table 1). 
Pollen concentrations increase from an average of ~6,500 grains per 
gram of sediment in the lower three samples to 61,000-121,500 gat 
the top. We could not identify any reworking of palynomorphs. Per- 
centages were calculated on the basis of the sum total of pollen and 
spores; 65 pollen and spore taxa were identified from the literature” ©’ 
(Extended Data Table 3). All samples contained a high morphological 
diversity of Podocarpus pollen, which we classified as Podocarpidites 
undiff. as many of these grains were either folded or damaged and were 
therefore unidentifiable beyond family level. Marine dinoflagellate 
cysts were absent in all samples. 


Palynomorph-based climate reconstructions (bioclimatic 
analysis) 

We reconstructed terrestrial mean annual temperature (MAT), precipi- 
tation (MAP) and WMMT using the NLR approach. The NLR approach 
uses the climatic requirements of the NLR of fossil taxa to reconstruct 
the past climatic range and assumes that the climatic requirements 
of the fossil taxa are similar to those of their NLR (Extended Data 
Table 2). NLR approaches use the presence or absence of individual 
taxa ina fossil assemblage rather than relative abundance, which 
reduces the likelihood of taphonomic biases. This facilitates, to some 
extent, the reconstruction of past, non-modern analogue climates and 
environments”. NLR-based temperature estimates are generally in 
good agreement with estimates from geochemical and other palaeo- 
botanical methods, including the Climate Leaf Analysis Multivariate 


Program (CLAMP) and leaf margin analysis” ”, providing confidence 


inthe utility of the method for the reconstruction of pre-Quaternary 
climates. 

However, quantitative climate estimates from the fossil plant record 
of deep-time geological intervals are always accompanied by large 
uncertainties. Incorrect use of outliers and fossil taxa with ambiguous 
affinity can result in erroneous climate estimates®. One of the greatest 
weaknesses that affects all NLR approaches is the assumption of 
uniformitarianism—namely, that the climate tolerances of modern 
species can be extended into the past. This assumption inevitably intro- 
duces uncertainty that increases with the age of the geological forma- 
tion®. To statistically constrain the most likely climatic co-occurrence 
envelope, we combined the NLR approach with the probability density 
function (PDF) method**”"”, In contrast to other NLR methods, such 
as the coexistence approach, the PDF method has the advantage that it 
statistically constrains the most likely climatic co-occurrence envelope, 
thereby offering a solution that mathematically reduces the poten- 
tial impact of wrongly defined climate tolerance on upper and lower 
limits of palaeoclimatic estimates. To further reduce uncertainties 
caused by potentially wrong identification of NLR, we removed fossil 
taxa with potentially ambiguous affinity or very rare occurrencein the 
fossil record (Extended Data Table 2). This includes Microcachryidites 
antarcticus, a taxon abundant and widespread in the Antarctic 
fossil record, with the NLR Microcachrys tetragona (the sole species 
of the genus Microcachrys that is now endemic to Tasmania). Another 
example is Peninsulapollis gillii with close links to the modern genus 
Beauprea, and endemic to New Caledonia. In both cases we used the 
family, Podocarpaceae and Proteaceae, respectively, rather than the 
genus or species as the NLR. 

To generate the paleoclimate estimate, we followed the procedure 
described in refs. °’’. We first identified the bioclimatic envelope 
for each NLR by cross-plotting their modern distribution from the 
Global Biodiversity Information Facility (GBIF) with the gridded 
WorldCLIM climate surface” using the dismo package” in R. We then 
filtered the dataset and removed redundant data, ‘exotic’ occur- 
rences (suchas garden plants) as well as multiple entries per climate 
grid cell to avoid the climatic probability function becoming highly 
slanted towards that location”. Before establishing the PDFs, boot- 
strapping was applied to test the robustness of the dataset, which 
is of particular interest for taxa with only few modern occurrences. 
Following the bootstrapping, we calculated the likelihood (f) of a 
taxon (t) occurring at value (x) for a certain climatic variable by using 
the mean (1) and standard deviation (a) of the modern distribution 
range of each taxa®””. 
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Because the separate reconstruction of climate ranges for each 
variable can lead to bioclimatic envelopes that include intervals, where 
no modern-day occurrence of tis observed®, we calculated joint likeli- 
hood PDFs for each combination of the climate variables MAT, MAP 
and WMMT using the correlation coefficient p (x, y): 
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After assessing whether all bioclimatic envelopes share a coexist- 
ence interval, the climate estimates of the NLR assemblage were recon- 
structed by multiplying the individual joint likelihoods of taxaf(x, y),... 
SY), With each other: 


L(Y), = 
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To constrain the core distribution of a group, we determined 
the range of one (f(X, Y) relative = 0.157) and two standard deviations 
(F(X, Y) relative = 0.023) from the occurrence within a group with /(%, y) max 
representing the most likely climate conditions”. 


I(x, Y)relative = rae 

For our bioclimatic analysis, we used all pollen and spore taxa that 
could be related to an NLR (following ref. °°, Extended Data Table 2). Cli- 
matic ranges are indicated with their +20 range. We calculated an MAT 
of 12.8 +2.2 °C, WMMT of 18.4+1.9 °Cand MAP of 1,120+330mmyr 1. It 
should be noted that the ranges of these values show the mathematical 
error and not the real range, which might result from the uncertainties 
in using an NLR approach. To avoid misunderstandings, we therefore 
indicated in the main text the pollen-based climate estimates without 
2o ranges. 


Organic geochemistry 

Freeze-dried and homogenized sediment samples were extracted by 
means of ultrasonication using a dichloromethane:methanol mix- 
ture (2:1, v-v). After centrifugation, the total lipid extract was dried by 
rotary evaporation. The extraction was repeated twice. The combined 
total lipid extract was fractionated using silica open-column chroma- 
tography and hexane as eluent to obtain apolar lipids. Hydrocarbons 
were analysed using an HP gas chromatograph 6890 (30 m DB-SMS 
column, 0.25 mm diameter, 0.25 um film thickness). The identification 
of n-alkanes, pristane and phytane was based on comparison of their 
retention times with those of reference compounds that were run on 
the same instrument. The TAR” was calculated using peak areas of 
long-chain (n-C,,, N-C4, n-C3,) against short-chain (n-C,;, N-Cy, N-Cy) 
alkanes. The carbon preference index (CPI) was calculated as follows: 


2 x (N-Cy3 + N-Cy5 + N-Cy7 + N-Cy9) 
N-Cyy +2 x (N-Cyy + N-Cy6 + N-Cyg) + N-C39 


CPI= (1) 


Heterocyst glycolipid palaeothermometry 

Sediment samples from the coastal sandstone (9R, 50-52 cm; 
26.76 mbsf) and the carbonaceous mudstone (9R, 76.5-78 cm; 
27.02 mbsf; 10R, 60-62 cm; 29.21 mbsf) were lyophilized and ground 
to a fine sediment powder using a solvent-cleaned agate pestle and 
mortar. Between 20.1 and 29.7 g of sediment were extracted using a 
modified Bligh and Dyer procedure”. Briefly, sediment samples were 
extracted ultrasonically (for 10 min.) three times ina solvent mixture of 
MeOH, DCM and phosphate buffer (2:1:0.8; v-v:v). After each sonication 
step, the solvent mixture was centrifuged at 1,500g for 3 min and the 
supernatant transferred to a centrifuge tube. The combined super- 
natants were phase separated by adding DCM and phosphate buffer 
to a final solvent ratio of 1:1:0.9 (v:v:v). The organic bottom layer was 
collected in a round bottom flask and reduced under vacuum using a 
rotary evaporator. Each Bligh and Dyer extract (BDE) was transferred to 
a preweighed vial using DCM:MeOH (1:1, v-v) and dried under a gentle 
stream of N,. Before analysis, all BDEs were redissolved in a solvent 
mixture of n-hexane:2-propanol:H,O (72:27:1; v-v:v) toa concentration 
of 8 mg ml. A procedural blank was added to the sample batch and 
treated as a regular sample to test for possible cross-contamination 
during sample preparation. 

High-performance liquid chromatograph coupled to electrospray 
ionisation tandem mass spectrometry (HPLC/ESI-MS’) was performed 
onthe BDEs following the analytical procedure given by ref. * to estab- 
lish heterocyst glycolipid (HG) distribution patterns and determine 
relative abundances. Separation of HGs was achieved using a Waters 
Alliance 2690 HPLC system fitted with a Phenomenex Luna NH, col- 
umn (150 x 2 mm/?; 3 pm particle size) and a guard column of the same 


material. Both were maintained at a constant temperature of 30 °C. 
The applied gradient profile was as follows: 95% A:5% B to 85% A:15% B 
in 10 min. (isocratic for 7 min) at 0.5 ml min”, followed by back flush- 
ing with 30% A:70% B at 0.2 ml min™ for 25 min and re-equilibrating 
the column with 95% A:5% B for 15 min. Solvent A was n-hexane:2- 
propanol:HCO,H:14.8 M NH, aq. (79:20:0.12:0.04; v-v:v:v) and Solvent B 
was 2-propanol:water:HCO,H:14.8 M NH, aq. (88:10:0.12:0.04; v-v:v:v). 

HGs were detected using a Micromass Quattro LC triple quadruple 
mass spectrometer equipped with an electrospray ionization interface 
and operated in positive ion mode. Source conditions were as givenin 
ref. ”’. All BDEs were analysed in multiple reaction monitoring mode to 
achieve maximum specificity. HGs were identified on the basis of acom- 
parison of retention times with those of HGs in cultured cyanobacteria, 
as well as published mass spectra®° **. HGs were monitored using the 
following transitions: m/z547 > 415 (pentose HG,, diol), m/z 603 > 471 
(pentose HG,, diol), m/z 619 > 487 (pentose HG,, triol), m/z 647 > 515 
(pentose HG; triol), m/z561 > 415 (deoxyhexose HG,, diol), m/z575 > 
413 (HG,, keto-ol), m/z577 > 415 (HG,, diol), m/z 603 > 441 (HG,, keto- 
ol), m/z 605 > 443 (HG,, diol), m/z 619 > 457 (HG,, keto-diol), m/z621> 
459 (HG, triol), m/z635 > 459 (methylated hexose HG,, triol), m/z 647 
> 485 (HG,, keto-diol), m/z 649 > 487 (HG,, triol), m/z 675 > 513 (HG; 
keto-diol), m/z 677 > 515 (HG;, triol) and quantified by integrating peak 
areas using the QuanLynx application software (version 4.1 SCN856). 

Surface water temperatures (SWTs) during the deposition of the 
coastal Eocene sandstone were reconstructed using the HDI, (het- 
erocyst diol index of 26 carbon atoms) and HDI,, (HDI of 28 carbon 
atoms) lipid palaeothermometers as described in ref. *°. As the HG 
content of the swampy palaeoenvironment exclusively consisted of 
HG,, triols and HG,, keto-diol (Extended Data Fig. 4), which are specific 
to cyanobacteria that form benthic microbial mats”, we here applied 
the HTI,, (heterocyst triol index of 30 carbon atoms) to the mudstone 
sequence. This index is defined as follows: 


HT]39 = HG, triol/(HG3, triol + HG, keto-diol) 


The HTI,, was transferred to absolute temperatures using a surface 
sediment calibration obtained from a large set of East African lakes 
(n= 47) located on an altitudinal transect from 615 to 4,504 m above 
sea level with SWTs ranging from 5.7 to 27.9 °C. In this setting, the HTI,, 
showed a strong linear correlation with SWT, whichis expressed in the 
equation below (T.B., unpublished data): 


SWT = (HTI,9/0.0249) - (0.2609/0.0249) 


Independent confirmation for the robustness of the HG-based tem- 
perature reconstruction is obtained by comparing HG distribution pat- 
terns and HTI,, values in the mudstone sequence with those reported 
for anaxenic culture of the heterocystous cyanobacterium Scytonema 
sp. PCC (Pasteur Culture collection of Cyanobacteria) 10023 (ref. **). 
This cyanobacterium exclusively contains HG,, triols and HG,, keto- 
diols. The above transfer function yields an HTI,) value of ~0.88 for 
the culture grown at an ambient temperature of 25 °C. This value is 
identical to the HTl,, (0.88) calculated using the relative abundances 
of the major HG,, triol and HG;, keto-diol isomers reported in ref. **. 


Grain-size analyses 

Aset of discrete samples was wet sieved at 2mm and 63 um to separate 
the gravel, sand and mud grain-size classes. The <63 pm (mud) suspen- 
sion was separated into silt (2-63 zm) and clay (<2 um) using settling 
velocity (Stokes’ Law) in Atterberg tubes. 


Clay mineral analyses 

Analiquot of the clay fraction was used to determine the relative con- 
tents of the clay minerals smectite, illite, chlorite and kaolinite using 
an automated powder diffractometer system Rigaku MiniFlex with Co 


Ka radiation (30 kV, 15 mA) at the Institute for Geophysics and Geology 
(University of Leipzig). The clay mineral identification and quantifica- 
tion followed standard X-ray diffraction methods®. 


Bulk sediment composition 
Total carbon (TC) and total nitrogen (TN) contents were analysed with 
an Elementar Vario EL III. The TOC contents were determined after 
removal of the total inorganic carbon (carbonates) with HCI using an 
ELTRA CS-2000. Carbonate content was calculated by subtracting 
the TOC from the TC and multiplying the difference (total inorganic 
carbon) by 8.33; that is, the ratio between the molecular weights of 
CaCO, and C. The TOC:TN (C:N) ratio was calculated ona molar basis. 
The mineralogical composition of the milled bulk sediment was ana- 
lysed semiquantitatively with X-ray diffraction using peak intensities 
and area ratios analysed with the MacDiff program (version 4.2.6.)®*. 
For the Fe(Ca) carbonates the peak intensities for ankerite (at 2.906 A) 
and siderite (at 2.795 A) were used and summed upas percentages for 
Fe(Ca) carbonates (ankerite and siderite) in relation to the absolute 
percentage of other carbonates (calcite, Mg calcite and dolomite). 


Thin sections 

After drying the untreated soft sediment in the fridge for 2-3 days, 
the sediment was dried at room temperature (20-22 °C) for another 
2-3 days. During that time the sediment was checked daily for crack 
formation. Under low pressure, the sediment was impregnated stepwise 
ina vacuum exicator with epoxy araldite 2020 resin until full coverage 
of the sample was achieved. After complete hardening, the bottom of 
the sample was ground by a Tegrapol with silicon carbid (SiC) paper 
sizes from 80 to 800—depending on sediment characteristics—and a 
maximum of 150 rotations per minute until the sediment surface was 
reached. The glass slides for the thin sections, which were 3 mm thick 
and 35 x 120 mm in area, were ground with a 9-tum-fraction SiC paper 
to achieve both grip and an even surface (alternative machine system: 
Logitech LPSO auto). Then the sample was attached to the slide with 
the same resin used for impregnation by a pressure block. Afterwards, 
the surface of the glass was cleaned and labelled with a diamond pen. 
Most samples were then cut by a WOCO 50 diamond saw to achieve 
250-um-thick sediment strips on the glass, before grinding with SiC 
paper or the Logitech LP5O to reach a thickness of 30 pm. Only some 
sections were covered with 150-ym-thick glasses, for which an ultra- 
violet resin (cyanacrylate) was used. Most sections remained uncov- 
ered for Raman and SEM-EDxX spectroscopy. Finally, all thin sections 
were cleaned with ethanol. The set of thin sections was prepared by 
MKfactory. 


Palaeomagnetic measurements 

Five discrete samples were taken at variable spacings from cores 9R 
and 10R of core PS104 20-2 for palaeomagnetic investigations using 
plastic boxes with inner dimensions of 2 x 2 x 2.cm’. The directions 
and intensities of natural remanent magnetization were measured 
on acryogenic magnetometer (model 2G Enterprises 755 HR). Sub- 
sequent alternating field (AF) demagnetization of natural remanent 
magnetization involved 15 steps to amaximum AF intensity of 100 mT. 
A detailed vector analysis® was applied to the results to determine the 
characteristic remanent magnetization of each sample and to unravel 
its magnetic polarity. Samples showing no systematic demagnetization 
pattern were excluded from further interpretation. 


Palaeoclimate modelling 

We use the COSMOS model (see Code availability) ina coupled atmos- 
phere-ocean configuration with fixed vegetation. The atmosphere 
component ECHAMS is run ina T31/L19 resolution®. It consists of 19 
vertical layers and has a horizontal resolution of ~3.75°. The ocean com- 
ponent MPI-OM runs ina GR30/L40 configuration®. It has a formal 
horizontal resolution of 3.0° x 1.8° and consists of 40 unequal vertical 


layers. The high-resolution hydrological discharge model is a part of 
ECHAMS”, while MPI-OM includes a dynamic-thermodynamic sea-ice 
model using a viscous-plastic rheology”. Climate simulations were run 
for present-day and mid-Cretaceous configurations under different 
CO, levels in the atmosphere. Other greenhouse gases (such as CH, 
and N,O) were set to PI levels. In the mid-Cretaceous simulations, we 
employed published paleogeography” and vegetation” as wellas noice 
sheets in both hemispheres. The orbital configurations in all Cretaceous 
experiments were fixed at 800 common era (CE) and hence represent 
values from the beginning of externally forced simulation from 800 
to 1800 cE (aso-called millennial run). The solar constant was reduced 
by 1% for the mid-Cretaceous experiments relative to the present-day 
value. The simulations with 1x and 2x PI CO, levels were run for 9,200 
and 9,000 years, respectively, and 10,600 years for 4x PI CO, (ref. **). 
All simulations reached equilibrium at the surface. The experiment 
with a 6x PI CO, level had a slightly different atmospheric land-sea 
mask than the other three simulations. It was run for ~500 years and 
was not ina full equilibrium at the surface>. The PI control simulation 
was run for ~7,500 years. The simulations with 2x and 4x PI CO, levels 
were branched off from the 1x PI simulation from model year 6,800 
and were further run for 700 years. The simulations reach either full 
or quasi-equilibrium at the surface. For the analyses the mean was 
taken over the last 100 years of each simulation. The model has been 
successfully applied previously for scientific questions focusing on the 
Quaternary””*, Neogene” *; Palaeogene”*"” and Late Cretaceous’, 
as well as estimates of future climate?” 


Sr and Nd isotopic measurements 

A total of seven samples were selected for processing from cores 9R and 
10R at site PS104_20-2. A detailed method description that was applied 
for determining their Sr and Nd isotopic compositions is given in ref. ©. 


Zirconand apatite U-Pb geochronology 

The youngest detrital zircon and apatite U-Pb ages obtained from 
the cores 2R (sample AWI-35 at 9.9 mbsf) and 9R (sample AWI-25 at 
26.7 mbsf) were used for constraining maximum deposition ages of 
the sandstone. The samples yielded Eocene apatite (n = 2) and zircon 
(n=1) ages. The single Eocene zircon grain yields a Concordia age of 
45.5+2.0 Myr (Extended Data Fig. 1a). The apatite grains all yield analy- 
ses discordant in U-Pb isotopic space due to the presence of common 
Pb (Pb,; that is, Pb incorporated during crystallization as opposed to 
radiogenic Pb* generated in situ by radionuclide decay). For single- 
grain ages, a terrestrial Pb isotope evolution model! was used for an 
initial estimate of 7” Pb,/*°Pb,, followed by an iterative approach to 
the ”’Pb-based corrected age calculation’. 

As only two Eocene single-grain apatite ages are reported, the cal- 
culation of an array age would not normally be appropriate. However, 
comparison of the trace element chemistry (REE-Sr-Y) to an apatite 
compositional reference library’ indicates that both Eocene grains are 
chemically, as well as chronologically, indistinguishable (Extended Data 
Fig. 1b), increasing the likelihood of acommon source. Therefore, the 
two youngest apatite grains from AWI-35 were jointly regressed with the 
range of ?’Pb,/?Pb, values (0.834 + 0.018) for West Antarctic crystal- 
line basement’” (Extended Data Fig. 1a) to obtain a lower-intercept age 
of 39.3 + 3.8 Myr (mean standard weighted deviation, MSWD = 0.99), 
similar to the independently obtained single-grain Concordia age of 
45.5 + 2.0 Myr determined from the youngest zircon from AWI-25. A 
Lutetian maximum deposition age (approximately 43 Myr) for AWI-35 
and AWI-25 is therefore indicated. 

Pure apatite and zircon separates were hand-picked from the 
non-magnetic heavy mineral 63-315 pm size fraction, mounted in epoxy 
resin, ground to reveal internal surfaces and polished. Almost no sample 
bias was introduced by grain selection because in most cases all of the 
observed mineral grains were picked as the amount of sample mate- 
rial was very small. All U-Pb analyses were carried out using a Photon 
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Machines Analyte Excite 193 nm ArF excimer laser-ablation system with 
a HelEx 2-volume ablation cell coupled to an Agilent 7900 ICPMS at 
the Department of Geology, Trinity College Dublin. Laser fluence was 
2.5) cm with a repetition rate of 15 Hz and an analysis time of 20 s, 
followed by an 8s pause to allow for signal wash-out and asubsequent 
baseline measurement. Spot sizes of 47 pm and 24 um were employed 
for apatite and zircon respectively, in separate analytical sessions. 

Data reduction employed the VizualAge and VizualAge UComPbine 
data reduction schemes (DRS) for Iolite for zircon and apatite, respec- 
tively’? "°, Each DRS corrects for intrasession analytical drift, mass bias 
and downhole fractionation using a user-specified fractionation model 
based on measurements of the primary standard; VizualAge UCom- 
Pbine also permits the presence of a variable Pb, content ina primary 
age standard that must be corrected using a known initial 7°’Pb,/°°Pb, 
value. Final U-Pb age calculations were made using the Isoplot add-in 
for Excel™. 

Single-grain zircon U-Pb Concordia ages were calculated, and 
analyses with a probability of concordance <0.001 were rejected”. 
The primary standard was PleSovice zircon; the GZ7 and 91,500 
zircons were used as secondary standards and treated as unknowns 
during data reduction and age calculation”, yielding Concordia ages 
of 530.14 3.7 Myr and 1,060.4 + 6.8 Myr, respectively. 

For apatite analyses, Madagascar apatite was employed as the 
primary standard and McClure Mountain and Durango apatites were 
employed as secondary standards”*"*. The Pb, value in the secondary 
standards was corrected using fixed initial ratios, yielding weighted 
mean ages of 532.2 + 6.0 Myr and 32.3 + 0.7 Myr, respectively. Vari- 
able Pb, contents in the detrital apatite unknowns were corrected by 
using a terrestrial Pb evolution model” for the calculation of single- 
grain ages followed by an iterative calculation to obtain single-analysis 
207Pb-corrected ages’. Alternatively, the range of the *°’Pb,/*°°Pb, 
values for West Antarctic basement® can be used for the single-grain 
age calculation: the resulting single-grain ages are within 1 Myr of the 
single-grain ages obtained using the iterative calculation. Apatite 
U-Pb age filtering” results in 20 errors of <50% for grains with ages 
of 10-100 Myr and 2c errors of <25% for grains with ages >100 Myr. 
For apatite trace-element analysis, the lolite Trace Elements DRS was 
used. NIST612 glass and Madagascar apatite" were employed as the 
primary and secondary reference materials respectively, with “*Ca as 
an internal elemental standard”. 


Data availability 


All data are available online via PANGAEA at https://doi.org/10.1594/ 
PANGAEA.906092. 


Code availability 


The standard model code of the ‘Community Earth System Models’ 
(COSMOS) version COSMOS-landveg r2413 (2009) is available upon 
request from the Max Planck Institute for Meteorology (Reinhard. 
Budich@mpimet.mpg.de). Analytical scripts are available via PANGAEA 
at https://doi.org/10.1594/PANGAEA.910179). 
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Extended Data Fig. 1| Tera-Wasserburg and PCA plots for U-Pb ages 

(in +Ma).a, Tera—-Wasserburg diagram showing apatite (red; 9.9 mbsf) and 
zircon (blue; 26.7 mbsf) U-Pb data. The red bar at the upper array intercept for 
Eocene apatite is the range of crystalline basement ?7°’Pb,/?°°Pb, values 
reported by (ref. '°*) for West Antarctica, which anchor the apatite age 


calculation. b, PCA plot showing trace-element data and single-grain ages (in 
Myr) for AWI-35 (9.9 mbsf) apatite, and lithological fields derived froma 
bedrock apatite reference library'*. Eocene grains (labelled in red) are 
chemically and chronologically distinct from other detrital apatite in the same 
sample. Data point error ellipses are 20. 
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Extended Data Fig. 2| Pollen abundance diagram. Percentages of the most abundant pollenand spores and their total counts in cores 9R and 10Rat site 
PS104_20-2areshown. 
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Extended Data Fig. 3 | Photomicrographs of selected pollen and spores. major.i, Trichotomosulcites hemisphaerius.j, Trichotomosulcites 
a, Cyathidites australis.b, Osmundacidites wellmanii.c, Ruffordiaspora subgranulatus.k, Taxodiaceaepollenites hiatus.\, Equisetosporites sp. 
australiensis. d, Ruffordiaspora ludbrookiae.e, Cycadopites follicularis. m, Nyssapollenites chathamicus.n, Peninsulapollis gillii.o, Proteacidites 


f, Microcachryidites antarcticus. g, Phyllocladidites mawsonii.h, Podocarpidites | subpalisadus.Scale bars, 10 1m. 
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Extended Data Fig. 4| HG palaeothermometry. Presence of HGs at 27.03-27.04 mbsf at site PS104_20-2 (core 9R) and river or lake surface water temperature 
(SWT) estimates from the HG-based molecular palaeothermometer (HT1;,). 
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Extended Data Fig. 5| Example microscopic images from thin sections. The locations of detailed microscopic images b-e. White arrows indicate the 
sections are taken froma fossil root fragment between 29.34 and 29.43 mbsfin locations of preserved parenchyma storage cells, including potential 
core 10Rat site PS104_20-2.a, Overviewscan of root fragment with indicated aerenchyma gas exchange cells (d). The scale bar in d applies to b-e. 
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refs. *”°*), b, CPI (left) and pristane/phytane (Pr/Ph; right) ratios. TheCPl points —_ respectively. 


Article 


Extended Data Table 1| Percentages of the most abundant pollen and spore taxa 


Depth (cmbsf) 
Core 

Section 

Core depth (cm) 


Gymnosperms 
Angiosperms 
Pteridophytes 
Bryophytes 


Araucariacites/Dilwynites 
Cycadopites follicularis 
Microcachryidites antarcticus 
Phyllocladidites mawsonii 
Podocarpidites spp. 
Taxodiaceaepollenites hiatus 
Trichitomosulcites spp. 
Nyssapollenites chathamicus 
Peninsulapollis spp. 
Proteacidites spp. 
Baculatisporites comaumensis 
Ceratosporites equalis 
Cyathidites spp. 
Gleicheniidites senonicus 
Laevigatosporites ovatus 
Lycopodiumsporites spp. 
Osmundacidites wellmanii 
Stereisporites antiquasporites 


Total Pollen Sum 
Pollen concentration (grains/g) 


2680 
020--2 
9R-1W 

55 
31.4 


23.8 
36.1 


13.7 


13.4 


357 
69320 


2685 
020--2 
9R-1W 

60 


69.1 
4.3 
21.7 


349 
55144 


2698 
020--2 
9R-1W 

73 


49.6 
74 
37.9 


356 
61895 


2702 
020--2 
9R-1W 

77 


63.1 


352 
121476 


2971 
020--2 
10R-1W 

111 


49.1 
20.6 
27.9 


344 
4250 


6900 


2984 
020--2 
10R-1W 

172 


63.2 
10.3 
23.7 


359 
7869 


Extended Data Table 2 | Key pollen taxa and the NLRs used to derive quantitative climate estimates 


Selected Pollen Taxa 


Gymnosperms 
Araucariacites/Dilwy nites 
Cycadopites follicularis 
Equisetosporites 
Microcachryidites antarcticus 
Phyllocladidites mawsonii 
Podocarpidites ellipticus; P. major 
Podocarpidites otagoensis 
Taxodiaceaepollenites hiatus 
Trichotomosulcites subgranulatus 
Angiosperms 

Liliacidites cf. variegatus 
Peninsulapollis gilli; P. truswellia 
Proteacidites parvus 
Proteacidites minimus 


Pteridophytes 
Baculatisporites comaumensis 


Ceratosporites equalis 
Cibotiidites tuberculiformis 
Cyathidites australis; C. minor 
Gleicheniidites senonicus 
Laevigatosporites ovatus 
Lycopodiumsporites sp. 
Osmundacidites wellmanii 
Perotrilites majus 
Polypodiisporites cf. minimus 
Ruffordiaspora australiensis 


Bryophytes 


Stereisporites antiquasporites Bryophyta; Sphagnum Sphagnum 


Botanical Affinity (after Raine et al. 2011) 


Araucariaceae (Araucaria, Agathis) 

Gymnospermopsida 

Ephedraceae (Ephedra, cf. E. chinleana) 
Podocarpaceae (Microstrobos, Microcachrys tetragona) 
Podocarpaceae (aff. Lagarostrobos frankilinii) 
Podocarpaceae (Podocarpus ?) 

Podocarpaceae (Podocarpus?, or Lagarostrobos) 
Cupressaceae, Taxodiaceae 

Podocarpaceae. Extinct Microcachrys 


Liliaceae; Monimiaceae (cf. Laurelia novaezelandiae ) 
Proteaceae 

Proteaceae (Bellendena montana type) 

Proteaceae (Knightia excelsa) 


Osmundaceae (Osmunda, Leptopteris ); Hymenophyllaceae 
(Hymenophyllum flexuosum, H. cruentum ) 

Lycopodiaceae, Selaginellaceae (Selaginella, e.g. S. tenuispinulosa ) 
Dicksoniaceae (cf. Dicksonia squarrosa , D. dissecta ); Schizaeaceae 
Cyatheaceae (Cyathea ), Dicksoniaceae, Schizaeaceae (Lygodium ) 
Gleicheniaceae (Gleichenia circinata group, Dicranopteris ) 
Aspleniaceae, Blechnaceae, Polypodiaceae, Schizaeaceae 
Lycopodiaceae (Lycopodium ) 

Osmundaceae (Todea barbara) 

Selaginellaceae? 

Davalliaceae (Nephrolepis ) 

Schizaeaceae 


Selected NLRs for 


Bioclimatic Analysis 


Araucaria 
Cycadales 
Ephedra 
Podocarpus 
Lagarostrobos 
Podocarpus 
Podocarpus 
Cupressaceae 
Podocarpaceae 


Liliaceae 

Proteaceae 
Proteaceae 
Proteaceae 


Hymenophyllaceae 


Selaginellaceae 
Dicksoniaceae 
Cyatheaceae 
Gleicheniaceae 
Polypodiaceae 
Lycopodiaceae 
Osmundaceae 
Selaginellaceae 
Davalliaceae 
Schizaeaceae 
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Extended Data Table 3 | Full list of identified pollen and spore taxa 


Bryophytes 

*Aequitriradites spinulosus (Cookson & Dettmann) 
*Annulispora folliculosa (Rogalska) 

*Coptospora striata (Dettmann) 

*Foraminisporis cf. F. wonthaggiensis (Cookson & Dettmann) 
*Stereisporites antiquasporites (Dettmann) 


Pteridophytes 

*Baculatisporites comaumensis (Cookson) 
“Biretisporites sp. 

*Ceratosporites equalis (Cookson & Dettmann) 
Cibotiidites tuberculiformis (Cookson) 
*Crybelosporites striatus (Cookson & Dettmann) 
*Cyathidites cf. C. asper (Bolkhovitina) 
*Cyathidites minor (Couper) 


*Cyathidites cf. C. punctatus (Delcourt & Sprumont) 


*Cyathidites undiff. 

*Gleicheniidites senonicus (Ross) 
Herkosporites sp. 

*Laevigatosporites ovatus (Wilson & Webster) 
Lycopodiacidites cf. L. dettmannae (Burger) 
*Lycopodiumsporites sp. 

*Osmundacidites wellmanii (Couper) 
Polypodiisporites sp. 

*Perotrilites cf. P. majus (Cookson & Dettmann) 
*Reticulatisporites cf. R. pudens (Balme) 
*Retriletes austroclavatidites (Cookson) 
Retitriletes cf. R. rosewoodensi (de Jersey) 
*Retitriletes undiff. 

*Ruffordiaspora australiensis (Cookson) 
*Ruffordiaspora ludbrookiae (Dettmann) 


Gymnosperms 

*Araucariacites/Dilwynites 

*Callialasporites dampieri (Balme) 
*Classopollis cf. chateaunovi (Reyre) 
*Cycadopites follicularis (Wilson & Webster) 


?Dacrydiumites praecupressinoides (Couper) 


*Equisetosporites sp. 

*Microcachryidites antarcticus (Cookson) 
*Phyllocladidites mawsonii (Cookson) 
*Podocarpidites cf. P. ellipticus (Cookson) 
*Podocarpidites cf. P. major (Couper) 
*Podocarpidites cf. P. otagoensis (Couper) 
*Podocarpidites undiff. 

*Podosporites sp. 

*Taxodiaceaepollenites hiatus (Potonie) 
*Trichotomosulcites hemisphaerius (Mays) 
*Trichotomosulcites subgranulatus (Couper) 
*Triletes undiff. 


Angiosperms 

?Beaupreaidites verrucosus (Cookson) 
*Cupuliferoidaepollenites cf. C. parvulus (Groot & Penny) 
*Liliacidites cf. L. intermedius (Couper) 
*Monosulcites undiff. 

*Nyssapollenites chathamicus (Mildenhall) 
Peninsulapollis gillii (Cookson) 
Peninsulapollis truswellia (Dettmann & Jarzen) 
*?Phimopollenites augathallaensis 
?Polycolporopollenites esobalteus 
Proteacidites parvus (Cookson) 
Proteacidites cf. P. subpalisadus (Couper) 
Proteacidites cf. P. subscabratus (Couper) 
Proteacidites minimus (Couper) 
Proteacidites sp. 

*Rousea georgensis (Brenner) 
*Tetracolpites sp. 

*Tricolpites cf T. pachyexinus (Couper) 
*Tricolpites minutus (Brenner) 

*Tricolpites sp. 

Triorites sp. 


All taxa identified during the current study are included. Question marks show uncertain taxon identifications that require further study. *Taxa described from the Tupuangi Formation on the 


Chatham Islands****. 
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Southern Ocean ecosystems are under pressure from resource exploitation and climate 
change’. Mitigation requires the identification and protection of Areas of Ecological 
Significance (AESs), which have so far not been determined at the ocean-basin scale. 
Here, using assemblage-level tracking of marine predators, we identify AESs for this 
globally important region and assess current threats and protection levels. Integration 
of more than 4,000 tracks from 17 bird and mammal species reveals AESs around sub- 
Antarctic islands in the Atlantic and Indian Oceans and over the Antarctic continental 
shelf. Fishing pressure is disproportionately concentrated inside AESs, and climate 
change over the next century is predicted to impose pressure on these areas, 
particularly around the Antarctic continent. At present, 7.1% of the ocean south of 40°S 
is under formal protection, including 29% of the total AESs. The establishment and 
regular revision of networks of protection that encompass AESs are needed to provide 
long-term mitigation of growing pressures on Southern Ocean ecosystems. 


The Southern Ocean—defined here as the circumpolar waters south of 
40°S—is home to a unique fauna and has an important role in biogeo- 
chemical cycles and the global climate system’. Past industrial sealing, 
whaling and demersal fishing caused marked perturbations from which 
some Southern Ocean ecosystems are only nowstarting to recover’. The 
harvesting of squid and toothfish continues** and interest is growing in 
the expansion of Antarctic krill (Euphausia superba) fisheries®. These 
target species are crucial prey for upper trophic organisms—krillis a key 
component of the Southern Ocean food web—and their potential deple- 
tion raises substantial concerns about the effects on Southern Ocean 
ecosystems’. Anthropogenic greenhouse gas emissions are simultane- 
ously causing large changes to the Southern Ocean’. Strong interest 
has therefore developed in the long-term conservation of the Southern 
Ocean, but authorities face the considerable challenge of implementing 
conservation goals within existing management frameworks’. 


Afirst step in meeting this challenge is to identify regions that should 
be considered for protection, for reasons suchas their high biodiversity, 
biological productivity or particular importance for certain life-history 
stages of species®”. The distribution and demography of marine preda- 
tors provides a viable basis for this°—particularly inthe vast and remote 
Southern Ocean, where integrated ecosystem measures are difficult 
to obtain at management-relevant, ocean-basin scales”. Indeed, on- 
shore measures of Southern Ocean marine predators have been used 
as regional indicators of ecosystem status for several decades”. Spatial 
aggregations of predators at sea identify not only areas that are impor- 
tant to the predator species themselves—which depend on lower trophic 
levels®—but also areas of broader ecosystem importance, suchas regions 
of elevated productivity and biomass at lower trophic levels". Combining 
information across predator species with diverse diets and life histories 
is essential for an ecosystem-wide approach that is less susceptible to 


A list of affiliations appears at the end of the paper. 
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Fig. 1| AESs in the Southern Ocean. a, Tracking data from 17 predator species 
were used to model the habitat importance for each species. Black points 
indicate tracking data and yellow points indicate tagging locations’®. 

b, Combining these model outputs gives the overall habitat importance, and the 
upper decile of overall habitat importance delimits AESs (white contours). Black 


factors that affect individual species”. There is a growing recognition 
of the value of tracking data for making decisions about conservation’. 


Using predator tracking data to identify AESs 

Inthe Southern Ocean, many predator species with differing diets and 
movement patterns have been tracked”’. We synthesized tracking data 
from 4,060 individuals of 17 species (Fig. 1a) to provide a circumpolar 
assessment of regions of ecological importance in the Southern Ocean. 
We identified regions that were preferred by multiple predator species 
as indicators of high levels of lower trophic biomass and biodiversity, 
and refer to these regions as AESs”. Our definition of AESs is not the 
same as Ecologically and Biologically Significant Marine Areas or Key 
Biodiversity Areas. However, it is consistent with several of the criteria 
that are used for defining Ecologically and Biologically Significant 
Marine Areas or Key Biodiversity Areas—particularly biological produc- 
tivity and diversity’—and so provides a similar qualitative, integrated 
assessment of biodiversity patterns. 

We assembled tracking data from 12 species of seabird and 5 species 
of marine mammal. The data were collected between 1991 and 2016". We 
used habitat-selection models (Methods, Supplementary Information, 
Extended Data Figs. 1-3) of individual predator species and then com- 
bined their spatial predictions to identify regions that were important 
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points indicate colony locations for the 14 colony-breeding species. 

c, AESs (blue) shown in context. Major oceanographic fronts are shown with grey 
lines: SAF, Sub-Antarctic Front; PF, Polar Front; SACCF, Southern Antarctic 
Circumpolar Current Front. 


to our full suite of species (Fig. 1b). This enabled us to account for incom- 
plete tracking coverage (that is, colonies from which no animals were 
tracked) and predict habitat importance for each species across the 
entire Southern Ocean. Combined, these predictions provided an inte- 
grated and spatially explicit assessment of areas of high biodiversity and 
biomass at multiple trophic levels. Sea surface temperature (SST) and 
wind strength were most often the best predictors of habitat selectivity 
in these species-specific models (Extended Data Fig. 4). SST has been 
linked to global patterns of marine biodiversity’, and in the Southern 
Ocean it acts as an indicator of water masses with different ecological 
properties”. Wind exerts several influences—including driving ocean 
currents and mixing; transporting iron; affecting the dynamics of sea 
ice; and ultimately determining primary production”°—and has been 
linked, for example, to the global distribution of albatrosses and pet- 
rels”. The importance of other predictor variables differed among spe- 
cies (Extended Data Fig. 4). The relationship between habitat selectivity 
and environmental predictors differed across species, showing how 
species used their environments in different ways (Extended Data Fig. 5). 


Distribution of AESs 


Regions with the highest scores for overall habitat importance were 
identified as AESs (calculated as the upper decile of those scores). 
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Fig. 2| Fishing effort in the Southern Ocean. a, Map showing fishing effort 
(total fishing hours between 2012 and 2016”’). Contour lines (white) indicate 
AESs. b, Kernel density plot showing the distribution of values of fishing effort 
(zero values not shown) inside (red) and outside (grey) AESs. Two-tailed 
permutation tests (n=1,098,226 grid cells) indicate a significant difference 
(P<0.001).c, Proportion of cells inside and outside AESs that had some (more 
than 0h; yellow) or no (0h; purple) fishing effort. 


These were located over the Antarctic continental shelf (89% of AES 
pixels south of 60°S were over or within 200 km of the shelf) and in 
two northerly aggregations: one encompassing much of the Scotia 
Sea and surrounding waters, and the second covering the chain of 
sub-Antarctic islands from the Prince Edward Islands through to parts 
of the Kerguelen Plateau (Fig. 1c). Regions of lower importance were 
identified in the southern Pacific and Indian Oceans. The distribution 
of AESs is associated with the availability of suitable habitats for breed- 
ing and resting, as well as regional oceanography and sea-ice dynamics 
that affect biological production (Fig. 1c). The AESs were based ona 
combination of island-breeding and wholly pelagic species, and there- 
fore reflect broad-scale patterns of importance. These patterns are 
supported by: (i) broad-scale patterns of primary production (Southern 
Ocean land masses provide iron fertilization that stimulates down- 
stream production in this otherwise iron-limited ecosystem”); (ii) 
historical whaling catches north of 60°S, which show that relatively few 
whales were taken in the southern Indian or Pacific Oceans, and that 
the region identified as an AES in the south Atlantic corresponds with 
high whaling catches”; and (iii) previous estimates of Antarctic krill 
distribution, which suggest that concentrations are high in the south 
Atlantic and lower in the south Pacific and southern Indian Ocean. 
The AES in the south Atlantic corresponds to the area of increased krill 
biomass, whereas the AES in the Indian Ocean partially corresponds 
to aregion dominated by myctophid fish and other euphausiids”. 


Exposure of AESs to potential stressors 

The Southern Ocean is subject to several stressors that influence its 
ecosystems, including an expansion of resource extraction and rapid 
climate change”°. We note that both temperature and wind—which were 
key parameters in many of our species-specific habitat models—are 
changing, and are projected to continue to do so”. 

Fishing has both direct effects on Southern Ocean biota through inci- 
dental bycatch and indirect effects through resource competition”®. Many 
demersal finfish were exploited during the latter part ofthe 20th century, 
which led to the decimation of some stocks in the Antarctic and sub- 
Antarctic’. Finfish fishing in the Antarcticis now regulated, and is focused 
on toothfish species caught with longlines. Fisheries for Antarctic krill 


began in the 1960s and are now concentrated in the south Atlantic sec- 
tor, most notably at the Antarctic Peninsula and South Shetland Islands, 
the South Orkney Islands and South Georgia’. Krill is managed witha 
low, precautionary catch limit that takes account of the key role of krill 
in the Antarctic food web. By global standards, fishing pressure in the 
Southern Ocean is low”, but indications are that pressure on its marine 
resources will grow””*. Fishing effort (Fig. 2a) was significantly different 
inside and outside of AESs (Fig. 2b), with a disproportionate amount of 
moderate-to-high effort (100 or more total hours of fishing) occurring 
inside AESs. Of cells with a moderate-to-high fishing effort, 37.9% were 
inside AESs, despite AESs only representing 10% of the study area. Areas 
of conspicuous fishing effort around southern South America, New Zea- 
land and Australia should be treated with caution, as our study does not 
include temperate predator species that are likely to figure prominently in 
these ecosystems (Fig. 2a). Nonetheless, relatively high-intensity areas of 
fishing that are directly relevant to the Southern Ocean occurred around 
the Falkland Islands (Islas Malvinas), where squid and some finfish are 
targeted; around South Georgia (ice fish, krill and toothfish); at the West 
Antarctic Peninsula (krill); and over the Kerguelen (toothfish and ice 
fish) and Campbell (squid and finfish) plateaux‘ °. Relatively important 
fisheries for toothfish also occur within the Ross Sea”’. 

The physical attributes of the Southern Ocean are changing. Sea ice 
is a critical component of high-latitude ecosystems and has central 
roles in oceanographic, biogeochemical and ecological processes. 
The biological consequences of sea-ice changes in the Southern Ocean 
include changes in breeding-site availability or access and prey avail- 
ability, and changes to the structure and function of ecosystems”. The 
pattern of sea-ice change in the Antarctic displays considerable regional 
and temporal variation. In the West Antarctic Peninsula, the extent of 
sea ice has declined markedly in recent decades, but has increased in 
other areas”. Most climate projections indicate that overall sea ice will 
decline over the next century””. Given the broad influence of both SST 
and wind on ecosystems, these components can also influence aspects 
of the biology of animals, including their breeding phenology, forag- 
ing success, survival and reproductive performance’®. However, when 
we contrasted the rates of change of sea-ice duration, SST and wind 
patterns inside and outside of AESs there were only slight differences, 
and considerable regional variation (Extended Data Fig. 6). The subtle 
nature of the differences in environmental change inside versus outside 
AESs does not negate the fact that the study area overall is undergo- 
ing marked changes in physical environmental processes, and that 
ecologically important areas are not being spared from these changes. 


Assessment of spatial management 


Management of marine systems is complex, especially in areas that lie 
beyond national jurisdiction® and where international effort is there- 
fore required, particularly for species that move between national 
and international waters*™. Relevant management includes traditional 
process-oriented tools such as individual species protection, stock 
assessments, decision rules and catch limits, as well as spatial tools 
suchas marine protected areas (MPAs)*», but also altered fishing prac- 
tices for mitigating bycatch**. Inthe high-latitude Southern Ocean, the 
Commission for the Conservation of Antarctic Marine Living Resources 
(CCAMLR) uses an ecosystem-based management framework that is 
intended to ensure that there are no long-term effects from fisheries 
on marine ecosystems”. This includes setting precautionary, spatially 
explicit catch quotas and a call for the establishment of a network of 
MPAs-—the design considerations of which can include the potential to 
provide climate change refugia and the inclusion of reference areas to 
help separate the effects of fishing from climate-related environmental 
change. Both approaches will benefit from better understanding of the 
locations of AESs. Outside the CCAMLR framework, MPAs have also 
been established by sovereign management authorities around some 
sub-Antarctic islands (Fig. 3a). Several other MPAs are currently under 
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Fig. 3 | Spatial protection of Southern Ocean AESs. a, Current (orange) and 
proposed (magenta) MPAs superimposed on overall habitat importance. White 
contours denote AESs, black lines show national Exclusive Economic Zones and 
the blue line shows the CCAMLR Convention Area. b, Area in current (orange) 


development, including within CCAMLR and by national authorities 
(Fig. 3a). However, the level of protection afforded by any individual 
MPA depends on its governance structure and the type and level of 
permitted activities (for example, fishing)***. 

An appropriately designed network of protected areas can help to 
buffer the effects of climate change and reduce the effect of stressors 
suchas bycatch or competition from fisheries”. We therefore quantified 
the coverage and placement of individual MPAs with reference to identi- 
fied AESs. Overall, 7.1% of the ocean south of 40°S is currently protected 
by MPAs, and this would increase to 11.2% if all currently proposed MPAs 
were implemented (Fig. 3b). This already meets, ina regional setting, the 
global Aichi Biodiversity Target 11 of 10% by 2020. The level of protection 
of the Southern Oceanis high by global standards—only 3.6% of the world’s 
oceans has MPA status at present, increasing to 7.3% with the addition of 
planned and announced MPAs**. However, protection needs to be targeted 
at areas of high conservation value, including those that areimportant for 
the persistence of biodiversity’. Existing MPAs cover 27% of the AESs identi- 
fied (Fig. 3b). Southern Ocean MPAs are predominantly in sub-Antarctic 
regions, and here they show high levels of congruence with AESs (Fig. 3a). 
Of noteis the Davis Bank region, south of the Falkland Islands (Islas Malvi- 
nas), where there are high levels of fishing inside AESs (Figs. 1, 2a, b). This 
area is now part of an MPA that was recently implemented by Argentina 
(Fig. 3a). Adoption of proposed MPAs for the Antarctic continental margins 
would raise the MPA coverage of AESs to 39% (Fig. 3b), including areas in 
East Antarctica, the Weddell Sea and the Antarctic Peninsula. The largest 
total AESs (4.0 million km’; 56% of AESs) are under CCAMLR jurisdiction 
(Fig. 3a, c), followed by 1.9 million km? (27% of AESs) in national waters 
(Exclusive Economic Zones), and only 1.2 million km? (16% of AESs) are 
outside the CCAMLR Convention Area and national waters (Fig. 3c). Imple- 
mentation of MPA proposals would benefit Southern Ocean ecosystems, 
especially those inthe Antarctic Peninsula, East Antarctic and Weddell Sea. 


Likely effects of future climate change 


We estimated the likely effects of future climate change on the distribu- 
tion of AESs under two representative concentration pathway (RCP) 
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simulations: a medium-forcing scenario (RCP4.5) anda more extreme, 
high-forcing scenario (RCP8.5)*°. For each scenario, eight global climate 
models—considered to be most suitable for Southern Ocean studies 
owing to their reliable reproduction of extant sea-ice conditions—were 
used to predict the locations of AES-like habitats in 2100. Here we dis- 
cuss only the RCP8.5 results, as current emissions of carbon dioxide are 
in line with this scenario”. Results for the moderate RCP4.5 scenario 
are presented in Extended Data Fig. 7. There was an overall reductionin 
the AES-like area (—3.3%), partitioned into an increase in sub-Antarctic 
AES-like cells (+5.7%) and a decrease in Antarctic AES-like cells (-10.2%) 
that outweighed this increase. 

Inthe sub-Antarctic, AES-like areas generally moved south (Fig. 4a), 
resulting in an overall growth in the area of sub-Antarctic AESs (Fig. 4b). 
This general southward migration of important habitat is consistent 
with projections for individual predator species (for example, king 
penguins (Aptenodytes patagonicus)), as well as for other species 
including krill and salps****. The advantages that predators gain from 
the overall increase in the area of sub-Antarctic AESs may be offset by 
the increased cost of travel to more-distant foraging grounds—at least 
for central-place foragers that dive (penguins and fur seals)—whereas 
volant species (albatrosses and petrels) or those that are unconstrained 
by terrestrial breeding sites (whales) may benefit from increased sub- 
Antarctic foraging opportunities*. Changes in the future distribution of 
AES-like areas along the Antarctic margin are more spatially heteroge- 
neous, with areas where AESs are lost interspersed with areas where they 
are gained or retained (Fig. 4a). However, there will be a net loss (-10.2%) 
of AES-like cells inthe CCAMLR Convention Area (Fig. 4b). The hetero- 
geneity of this pattern is in part a result of the dynamic nature of the 
high-latitude Antarctic marine environment and the uncertainty across 
a number of climate-model variables in this region. This uncertainty 
is due to the variability in the skill of models in reproducing current 
climate, and the large range of projected responses from those models. 
Our projections are based on unchanged future availability (that is, 
colony locations and sizes) and species—environment relationships. 
However, as species adapt to future pressures and changes to their 
available breeding habitat, populations are likely to change both their 
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Fig. 4 | Projected change in the distribution of AESs under RCPS8.5. a, Cells 
that were AESs in the original results are shown in blue (remain as AES) or orange 
(become non-AES in the future). The gradation from orange to blue shows the 
proportion of climate models that indicate loss (orange) or retention (blue) of 
AESs. Similarly, the gradation from white to green shows the proportion of 
models that indicate that non-AES cells will remain as non-AES (white) or 
become AES (green). Orange and magenta outlines show current and proposed 
MPAs, respectively. b, Percentage change in the area of AESs according to the 
eight different climate models (black points), and the mean of these (red points). 
Inthe box plots, the box indicates the 25th-75th percentiles, and the whiskers 
extend tothe smallest value or largest value that is not further than1.5 times the 
interquartile range from the 25th or 75th percentile, respectively. 


preferred colony locations and habitat usage. Sub-Antarctic-breeding 
species have limited availability of alternative breeding sites, but colony 
sizes might change. Ice-breeding species might be able to relocate, 
and land-breeding species that require ice-free terrain might be able 
to occupy previously vacant areas, or some might move to regions 
that become ice-free owing to changing local conditions*®. The loss 


of AES-like habitat on the Antarctic margin that our models project 
suggests that these populations will be under pressure as the climate 
continues to change, and therefore continued monitoring of these 
species and ongoing assessment of the effectiveness of management 
actions (for example, MPAs) will be important. Monitoring of colo- 
nies will need to detect local colonizations, particularly when popula- 
tions are small*. As part of the designation of MPAs within CCAMLR, 
research and monitoring plans are necessary and required; these plans 
should—among other factors—consider changes to species—environ- 
ment relationships and other dynamic processes within and adjacent 
to the protected area, given the pressures of ongoing climate change. 

There was a mixed response across the eight climate models, with 
changes in the number of AES-like cells that are included in current 
MPAs ranging from —-8.7% to +8.4% (Fig. 4b). When the proposed MPAs 
were included (current + proposed MPAs in Fig. 4b), all climate mod- 
els indicated a decrease (between -16.9% and —0.9%) in the number 
of AES-like cells within MPAs. This suggests that proposed MPAs are 
in areas that are projected to become less similar to existing AESs by 
2100. Any protection afforded by MPAs in such areas could provide 
better medium-term opportunities for populations to adapt, as they 
will not have to cope with both climate change and other stressors 
during that period. 


Conclusion 


Our work provides strong evidence in support of the ecological impor- 
tance of existing and proposed Southern Ocean MPAs. By integrating 
tracking data from a suite of predators, we identified regions that are 
likely to have high biodiversity and biomass of the prey (and concomi- 
tant ecosystems) of the animals that were tracked. Our AESs are clearly 
candidates for protection, and the implementation of the proposed 
MPAs within the CCAMLR region would greatly increase the protection 
of important habitats in the Southern Ocean. Several MPA proposals 
have failed to reach consensus within the CCAMLR process, and even 
when adopted result in MPAs with varying degrees of protection. Many 
sources of input are needed to establish MPAs, but the AESs that we 
have described here will help to make the scientific case in this mul- 
tifaceted process”** by providing an ecosystem-level analysis of the 
areas that most warrant protection. The design of MPAs should also 
consider future conditions. Pressures on AESs owing to climate change 
will affect all parts of the Southern Ocean, but their effects are likely to 
be strongest along the Antarctic margin. The responses of species to 
these pressures are currently difficult to predict, highlighting the need 
for continued monitoring as part of ongoing management actions. 
Because only 16% of all Southern Ocean AESs are outside the CCAMLR 
Convention Area or national waters, the responsibilities for these future 
actions lie mostly with CCAMLR members and those nations with sover- 
eign territory in the sub-Antarctic. Adaptive management approaches 
to conservation measures (including MPAs) will be necessary to deal 
with these future changes in a timely way. The Southern Ocean can be 
an exemplar of how science, policy and management can interact to 
meet the challenges of a changing planet. In the Southern Ocean, these 
challenges will be considerable, and will include increased fishing pres- 
sure as the global demand for marine resources grows”’. Our results 
highlight where future science-informed policy efforts might best be 
directed, including both adaptive spatial protection and improved 
robust management of fisheries. Similar synthetic approaches should 
capitalize on the increasing amount of tracking data that are being 
collected through large-scale initiatives*° to indicate regions in need 
of protection globally. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size. 
The experiments were not randomized and the investigators 
were not blinded to allocation during experiments and outcome 
assessment. 


Analytical overview 
We assembled tracking data from 17 species of seabirds and marine 
mammals, collected between 1991 and 2016, from across the Antarc- 
tic predator research community”. Birds and mammals comprise 
the majority of top predator species in the Southern Ocean, which 
has few other large, highly mobile marine predator taxa (bony and 
cartilaginous fishes). These include toothfish, southern bluefin tuna 
(Thunnus maccoyii, which occur in the northernmost part of our 
study area) and a small number of shark species. Very few of these 
fish and shark species have been tracked, with very few tracking data 
available south of 40°S*™. Although some bias might result from our 
use of species, this does not detract from the underlying logic of our 
approach: that by using the at-sea distributions of an ecologically 
diverse suite of predators we can identify areas of ecological impor- 
tance. Our dataset represents 4,060 individual tracks and more than 
2.9 million location estimates (Fig. la). After filtering and quality 
control, we retained 2,823 tracks comprising 2.3 million locations”®. 
The approximately 30% of tracks that were excluded were those with 
poor-quality location fixes that could not be properly filtered, tracks 
from individuals that did not actually depart the colony, or tracks 
with other problems detected during the rigorous quality control 
process that we implemented. The full process is described in our 
companion data paper”, which makes available all of the data for 
use by the broader community, without providing further analytical 
investigation to consider the matters raised here. The environmen- 
tal covariate values along each of these tracks (the ‘used’ habitat) 
were compared statistically with the habitat available to each ani- 
mal, thereby allowing the habitat selection of each species to be 
determined**? (Extended Data Figs. 1, 2). We fitted habitat-selection 
models for different life-history stages within a species. Despite the 
considerable size of the dataset, it is not an exhaustive representation 
of animals from all known colonies (for central-place foragers) or 
geographic regions (for non-central-place foragers). To account for 
incomplete tracking coverage, we used the fitted habitat-selection 
models to map habitat importance for each life-history stage of each 
species across the entire Southern Ocean, including areas around 
colonies without tracking deployments (Extended Data Fig. 3). For 
each species, we calculated the average habitat importance across 
life-history stages. For colony-breeding species, colony sizes were 
used to weight the habitat-importance values, upweighting areas 
that were of importance to large colonies (Extended Data Fig. 8). 
Southern Ocean predator species can be clustered into Antarctic and 
sub-Antarctic species (Extended Data Fig. 9). We mapped assemblage- 
level habitat importance (Extended Data Fig. 10) for each of these two 
groups (hereafter ‘overall habitat importance’ maps) by averaging 
across species-level maps. To calculate the overall map, we took the 
maximum of the two assemblage-level importance values in each 
cell. Areas with high values of overall habitat importance (in the 
top decile of values) indicate areas that are attractive to many spe- 
cies; these represent AESs”. We then compared the overall habitat- 
importance values inside and outside AESs in the context of fishing 
effort and changes in physical environmental conditions (duration of 
sea-ice cover, SST and wind speed). We finally quantified the spatial 
protection afforded to AESs under current and proposed spatial 
management plans. 

We describe the methods in more detail inthe Supplementary Infor- 
mation. We conducted all the analyses in R™. 


Tracking data 

The data represent the output from a variety of types of tracking tags, 
providing location estimates at different spatio-temporal resolution 
and accuracy. We applied a state-space model* to estimate the most- 
probable locations at regular temporal intervals, while accounting for 
potential errors in the location estimates with automatic and manual 
quality control before and after filtering’®. Although this procedure 
does not make the track froma light-based tag as accurate as one from 
aGPS device, it does provide a consistent characterization of the posi- 
tional accuracy across different tag types, allowing the uncertainty in 
position to propagate into the uncertainty in the parameters of the fit- 
ted movement model and inthe track simulation step (see below). We 
note that the GLS errors are larger than the resolution of the grids used, 
especially near the poles, which may be problematic for the analyses. 
However, the light-based tag deployments were made almost exclu- 
sively onsub-Antarctic animals (albatrosses and fur seals). The spatial 
scale of our results (AESs) in the sub-Antarctic zone (around 5 million 
km?) is considerably larger than the probable scale of positional error 
of light-based tags (around 100 km) and so we do not believe that using 
a mixture of tag types has adversely affected our results. 


Life-history stages 

Most of the species in the study are central-place foragers (that is, 
they return periodically to land or seaice to breed, moult or rest). The 
constraints faced by these predators at different stages in their life- 
history cycle mean that their movements differ markedly across these 
stages. We therefore fitted models separately for up to five predefined 
life-history stages in the breeding cycle of each species. We automati- 
cally assigned tracks to these stages on the basis of calendar date, with 
manual reassignment where necessary following examination of indi- 
vidual movement patterns. This resulted in 40 data subsets (17 species 
with 1-4 life-history stages) with sufficient data for habitat-selection 
modelling (Supplementary Table 1). 


Simulating tracks to estimate available space 

The observed locations only provide information about where ani- 
mals occur, not about where they could have gone. To estimate the 
geographic space potentially available to animals, we simulated sets of 
tracks for each observed track. For each observed track, we simulated 
50 tracks using the movement model described above®. This yielded 
simulated tracks with movement characteristics (distributions of step 
length and turning angle) that are the same as the observed track, but 
they are random and independent of environmental effects. Thus, the 
simulated tracks provide an estimate of the geographic space that each 
animal could have occupied (given its movement characteristics and 
track length) if it had no habitat preferences. The environmental differ- 
ences between the available geographic space and the used geographic 
space allow the habitat selection of the organisms to be estimated, as 
detailed below. Locations at the animal’s home colony, and locations 
at known terrestrial resting sites, were fixed at the corresponding time 
and date in the simulated tracks to accurately simulate central-place 
foraging behaviour (Supplementary Information). 


Environmental data 

To characterize the biophysical environment at observed and simulated 
locations, we compiled a suite of 19 environmental covariates (Extended 
Data Fig. 2, Supplementary Table 2) and extracted the value of these at 
each location. The covariates were remotely sensed, measured in situ 
or model-estimated and represent biophysical features that influence 
the movement, distribution and density of marine predators”. It 
was not computationally feasible to temporally match environmental 
data to each location estimate. Rather, we created a climatology that 
spanned each tracking data subset (species by the combination of life- 
history stages), using the predefined stage dates. We took the mean 
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(or standard deviation) of the environmental data that fell on these 
days of the year (stage dates) over the whole study period (November 
1991 to June 2016). Some covariates (for example, salinity difference) 
were only available as monthly climatologies, and we used the months 
corresponding with the stage dates to calculate the mean (or standard 
deviation). All covariates were resampled toa 0.1° x 0.1° grid; hereafter 
we refer to the pixels of this grid as ‘cells’. We checked the covariates for 
each data subset for missing values and if more than 10% of values were 
missing we excluded the covariate from that model. This influenced 
mainly the chlorophyll a concentration variable, which was excluded 
from 17 of the 40 habitat models (Supplementary Table 1). This affected 
life-history stages with a large proportion of winter days, as chloro- 
phyll a has poor winter satellite coverage owing to being obscured by 
extensive cloud cover. However, chlorophyll a was rarely an important 
predictor in the models in which it was included; thus, excluding it from 
models probably had only a negligible effect. 


Habitat-selection models 

We used a habitat-selection modelling framework® to model and 
predict the space use of marine birds and mammals of the Southern 
Ocean. These models use the observed locations of each individual 
animal and an estimate of the geographic space available to each indi- 
vidual, along with covariates that characterize their environment. The 
environmental differences between the habitat that was used and the 
habitat that was available allow the habitat selection of the organisms 
to be estimated. To fit the models, we used boosted regression trees, a 
machine-learning algorithm that produces an ensemble of regression 
trees that have been iteratively fitted ina boosting process toimprove 
accuracy”. We tested several other algorithms but boosted regression 
trees showed the best predictive performance in another study® and 
in our tests. For a given location, the response variable was whether 
the location was an observed or simulated (available) location, andthe 
explanatory covariates were the associated environmental covariates. 
Boosted regression trees have four parameters that must be set: the 
number of trees (boosting iterations), the maximum tree depth, the 
learning rate (shrinkage) and the minimum number of observations 
in anode. We chose these values as the combination that minimized 
the area under the receiver operating characteristic curve (a measure 
of model predictive performance) during tenfold cross-validation. We 
also used this metric to evaluate the final fitted models. We used the 
fitted model to generate spatial predictions for the entire study region 
and we estimated the uncertainty associated with these predictions 
using a bootstrap approach (Supplementary Information) 


Accessibility model 
The modelling procedure described above does not account for the 
accessibility of a given location to an individual animal (in effect, it 
estimates the habitat selection of a given location in terms of its envi- 
ronmental characteristics, but without considering whether or not the 
animal could actually reach that location). For central-place foragers 
in particular, this is an important consideration. We therefore used a 
second set of models to account for this. We modelled accessibility 
interms of the number of observed plus simulated locations ina given 
cell as a function of the distance of the cell to the deployment colony. 
We fitted binomial models with a smooth, monotonic decreasing con- 
straint°’, under the assumption that the accessibility of cells should 
decrease with geographic distance. To estimate uncertainty, we sam- 
pled curves from the posterior distribution of each fitted accessibility 
model to use ina bootstrap approach (Supplementary Information). 
We used these models to predict the accessibility of each cell over 
the study region to each species during each life-history stage (thatis, 
given the distance of acell froma colony, the fitted accessibility model 
provides an estimate of the probability that animals from that colony 
would be able to visit that cell). For colony-breeding species (those 
other than humpback whales, crabeater and Weddell seals), colony 


sizes were used to weight this accessibility estimate: for a given cell, 
the accessibility from all known colonies of that species was calcu- 
lated. A weighted mean of these accessibilities was then taken, using 
colony sizes as weights. Thus, this weighted accessibility represents 
the probability that a randomly selected individual from the global 
population would be able to visit that cell, effectively upweighting 
cells in the vicinity of large colonies. 

For the non-colony breeding, ice-associated seals (crabeater and 
Weddell seals), we modelled accessibility as a function of distance 
beyond the ice edge (15% ice concentration contour), rather than dis- 
tance to the colony. For humpback whales, we assumed that the whole 
study area was equally accessible. 


Transforming output and combining models to predict habitat 
importance 

The habitat-selection models predict the value of the habitat at a 
location given that the animals could access that location. The pre- 
dictions of the habitat-selection models were therefore multiplied 
by the predictions of the accessibility models to yield an index that 
reflects both the habitat selection of each cell and its accessibility to 
the animals. This is not an estimate of the probability of a species using 
a given cell, because that probability also depends on the prevalence 
of the species*’. As prevalence varies between species, our habitat- 
selection estimates cannot be compared directly between species. 
We therefore partitioned the cells into decreasing percentiles based 
on area™ to obtain a map of habitat importance expressed in terms of 
area (for example, cells with values of 90 or higher represent the top 
10% most-important habitat by area for that species). We refer to this 
as habitat importance, and these maps can be compared among spe- 
cies. To create a single habitat-importance layer for each species, we 
averaged the stage-specific habitat-importance layers. 


Species-specific habitat importance 

We calculated community-level habitat importance by averaging the 
species-specific maps of habitat importance. Sub-Antarctic regions 
are naturally more species-diverse than those of the Antarctic, and 
so a simple average of all species together tended to strongly favour 
sub-Antarctic areas simply because of their greater species diversity. 
To account for the differences in species richness between the Ant- 
arctic and sub-Antarctic, we first defined two species groups using 
an unweighted pair group method with arithmetic mean (UPGMA) 
hierarchical clustering with Manhattan distance, applied to habitat- 
importance scores (Extended Data Fig. 9). This produced two clear 
groups: an Antarctic species group (emperor penguin, crabeater seal, 
Antarctic petrel, Adélie penguin and Weddell seal) and asub-Antarctic 
species group (Antarctic fur seal, black-browed albatross, wandering 
albatross, sooty albatross, grey-headed albatross, king penguin, maca- 
roniand royal penguin, light-mantled albatross and white-chinned pet- 
rel). The wide-ranging humpback whales and southern elephant seals 
did not clearly fall into either cluster, and so were treated as belonging 
to both groups. The mean habitat importance was calculated for each of 
these groups separately and then combined (Extended Data Fig. 10) by 
taking the maximum of the two values (Antarctic and sub-Antarctic) in 
each pixel. We refer to this final layer as the overall habitat importance. 


AESs 
To identify the most-important areas, we calculated the 90th percentile 
(top decile) of the overall habitat-importance values. Cells with overall 
habitat-importance values above this threshold together comprised 
AESs. 


Environmental pressures 

To assess past environmental stressors on the Southern Ocean eco- 
system, we calculated change in SST, wind speed and sea-ice dura- 
tion. We selected SST and wind because they were frequently the 


most-important predictor variables in the habitat models (Extended 
Data Fig. 4), and sea-ice concentration as this was an important predic- 
tor for Antarctic species. Moreover, these variables are considered 
to be important drivers of ocean and ecosystem dynamics, and 
key axes on which environmental change in the Southern Ocean has 
been detected”. For each cell, we calculated the change in SST (°C) or 
wind speed (ms‘) as the difference between mean SST or wind speed 
in 1987-1999 and 2007-2017. For sea-ice duration, we calculated the 
difference in the mean number of days per year that each pixel had a 
sea-ice concentration of higher than 15%, for the same periods. These 
periods represent the decades at the beginning and end of a 30-year 
period that covers our study period. Thirty years is also the recom- 
mended period for climate assessments”. We also obtained data 
on fishing effort—which is considered to be a major environmental 
stressor in many regions of the Southern Ocean””’—from the Global 
Fishing Watch dataset, covering the period from 2012 to 2016””. We 
compared the values of these four stressors in the AESs and outside 
cells using random permutation tests with 10,000 permutations. The 
null hypothesis is that stressor values inside and outside AESs are from 
the same distribution. 


Future projections of AESs 
Our predicted AESs (under current environmental conditions) are 
determined by both the oceanographic and climatic conditions of an 
area, as well as the accessibility of that area to each of our species of 
interest. In principle it would be possible to use future projections of 
environmental data and accessibility along with our fitted models to 
obtain future projections of AESs. However, some predictor variables 
are not available from the climate models used for the future projec- 
tions, and although other variables might appear to be available, they 
have different properties owing to factors such as different temporal 
and spatial resolution in the output, or the ability of the climate model 
to resolve the relevant processes. For example, sea surface height from 
satellite altimetry gives information about frontal and mesoscale fea- 
tures. Yet, although sea surface height is available as an output from 
many CMIPS models, those models do not explicitly resolve mesoscale 
features® and so the model-output data for sea surface height will not 
be acting as a proxy for the same oceanographic properties as the data 
from satellite-derived altimetry. 

Toassess future distributions of AES-like habitat, we therefore used 
a k-nearest neighbour classifier approach that is conceptually similar 
to climate analogues. For each grid cell we compiled current (end of 
20th century) environmental conditions, as well as projected condi- 
tions at the end of the 21st century from climate models (see below). In 
terms of accessibility, most of our study species breed in colonies, and 
‘accessibility’ for these species is determined both by the geographic 
distribution of their colonies and by the colony sizes. Future projec- 
tions of colony location and size do not exist for our study species at 
present, although initial work has begun for some species, such as 
king penguins**. Colony locations and sizes were therefore assumed to 
remain constant, and so the accessibility of each grid cell to each species 
was assumed to remain unchanged. For each grid cell, we compared its 
projected future environmental and accessibility conditions to every 
cell in the current (20th century) grid and selected the five cells that 
were most similar. If the majority of those cells were from current AESs, 
the projected cell was labelled as ‘AES-like’; otherwise, it was labelled 
as ‘not AES-like’. These projections therefore provide an indication of 
the future distribution of AES-like environmental conditions, under the 
assumptions that colonies do not move or change in size, and that the 
animals do not change their habitat preferences. These assumptions 
are unlikely to holdin reality; however, examining the changes in AES- 
like habitat under these assumptions allows us to isolate the effects 
of environmental change from colony or habitat-usage changes. As 
environmental change occurs, species are likely to adapt by changing 
their colony distributions and habitat usage. The AES projections offer 


insights into the likely distribution of environmental pressures, and 
thus where adaptation by species might be important. 

Climate data were compiled from eight global climate models 
(ACCESS1.0, BCC-CSM1.1, CanESM2, CMCC-CM, EC-EARTH, GISS-E2-H- 
CC, MIROC-ESM and NorESM-M), which were considered to be most 
suitable for Southern Ocean studies by virtue of reliably reproducing 
extant sea-ice conditions. These models were from phase five of the 
Coupled Model Intercomparison Project (CMIP5) of the World Cli- 
mate Research Programme. For each model, we extracted data for a 
30-year period concomitant with our tracking data (1976-2005), and 
for a30-year period at the end of the 21st century (2071-2100). We 
extracted future (2071-2100) climate data from projections under 
two RCP simulations: a medium-forcing scenario (RCP4.5, which 
assumes that society implements changes to limit future CO, emis- 
sions in the near future, with peak emissions occurring in 2040) and 
amore-extreme, high-forcing scenario (RCP8.5, which assumes little 
curbing of emissions and retains a strong reliance on fossil fuels into 
the foreseeable future)*°. Reference data (1976-2005) were extracted 
from hindcast model runs that attempt to simulate historical condi- 
tions, and consequently use observed CO, concentrations over the 
past 160 years to guide the models. 

A maximum of eight variables were extracted for each model, 
depending on the available data (not all models provide all variables), 
at monthly time resolution. The variables used were sea-ice concentra- 
tion, SST, sea surface salinity, sea surface height, the spatial gradient 
of sea surface height, near-surface current speed, near-surface wind 
speed and surface downward heat flux. The 30-year mean and standard 
deviation of each variable was calculated over summer (December to 
February) and winter (July to September) months. All variables were 
normalized to the range 0-1 before further analysis. 

The resulting set of up to 48 predictors (mean and standard devia- 
tion of up to 8 environmental variables, each for summer and winter, 
plus accessibility layers for 16 species) naturally showed high correla- 
tion between many of the variables. We used a principal component 
analysis to reduce the dimensionality of this dataset, choosing the 
lowest number of principal components required to explain at least 
95% of the variance in the original data; this number ranged from 
14to17 components, depending on the model and scenario. For each 
projected-climate cell, the nearest neighbours in the historical-climate 
grid were calculated using Euclidean distance on these normalized and 
dimension-reduced data. 
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Reporting summary 


Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 
The tracking data are available through our companion paper”. 


Code availability 
Computer code is available at https://github.com/SCAR/RAATD. 
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Extended Data Fig. 1| Overview of the modelling process. a, Habitat 
importance for agiven life-history stage (for example, chick-rearing) ofa given 
species (for example, king penguin (A. patagonicus)) is calculated using two 
models (grey boxes): the habitat-selection model (box 1) and the habitat 
accessibility model (box 2). b, These stage-specific, species-specific 
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transformation 
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King penguin habitat importance 
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Antarctic fur seal habitat importance 


Breeding 


predictions of habitat importance are combined to calculate the mean habitat 
importance for multiple species (for example, king penguin and Antarctic fur 
seal (Arctocephalus gazella)). Inthe habitat accessibility model (box 2 ina) the 
distance to colony can be weighted by relative colony size or not. The 
unweighted version is shown here. 
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Extended Data Fig. 2| Maps showing the 19 environmental covariates that 
were used to model the habitat selection of marine predators inthe 
Southern Ocean. Grey lines indicate major oceanographic fronts. CHLA, 
chlorophyll aconcentration; CURR, geostrophic current velocity; DEPTH, 
depth; DEPTHg, depth gradient; dSHELF, distance to shelf; EKE, eddy kinetic 
energy; ICE, sea-ice concentration; ICEA, accessibility through sea ice; ICEsd, 
standard deviation of sea-ice concentration; SAL, salinity difference; SHFLUX, 
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surface heat flux; SHFLUXsd, standard deviation of surface heat flux; SSHa, sea 
surface height anomaly; SSHsd, sea surface height standard deviation; SST, sea 
surface temperature; SSTg, sea surface temperature gradient; VMIX, vertical 
velocity; VMIXsd, standard deviation of vertical velocity; WIND, surface wind 
speed. Sources and units of measurement are defined in Supplementary 

Table 2. 
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Extended Data Fig. 3 | Habitat-importance scores for 16 marine predator and royal penguins (Eudyptes schlegeli) are combined. Black circles show all 
species in the Southern Ocean. The maps show predicted habitat importance known colony locations for the 14 colony-breeding species, which we used to 
for each species. Predictions for macaroni penguins (Eudyptes chrysocome) predict the models across the whole Southern Ocean. 
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Variable relative importance 


Extended Data Fig. 4 | Covariate importance. Relative importance of 

19 environmental variables that were used as predictors in 40 boosted 
regression tree models of the habitat selection of Southern Ocean marine 
predators. Higher values of variable relative importance indicate that the 
variable has higher predictive power. Points show the values for each model 
and box plots (in grey, behind) show the distribution of values. Variables are 


ordered (top to bottom) by decreasing median importance. The three panels 
show the results for three different groups of species that were identified by 
hierarchical cluster analysis (see ‘Species grouping’ in Methods, and Extended 
Data Fig. 7). Full covariate names are provided in Supplementary Table 2. Box 
plots as in Fig. 4. 
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Extended Data Fig. 5| Varied relationships between covariates and habitat scatter plot) smoothing was not computationally feasible. Full covariate names 
selection across species. Scatter plot smoothed curves (black lines) of the and units are provided in Supplementary Table 2. Higher habitat-selection 
relationship between predictions of the species habitat-selection models values indicate higher probabilities of use, irrespective of availability in this 
(boosted regression trees) (vertical axis) and the values of covariates used as case. Asmooth curve is shown for each species. Because each species had one 
predictors in our boosted regression tree models (horizontal axis). The to five predictions, for different life-history stages, we took the mean habitat- 
smoothed curves were drawn by fitting generalized additive models for large selection estimate per cell for each species. Rug marks on the horizontal axis 


datasets witha thin plate regression spline basis, as LOESS (locally estimated indicate the distributions of the data points. 
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Extended Data Fig. 6 | Potential environmental stressors in the Southern 
Ocean. a-c, Maps showing the change (mean in 1987-1998 compared to mean 
in 2007-2017) in sea-ice duration (days) (a), SST (°C) (b) and wind speed (ms) 
(c). Contour lines (black) indicate AESs. d-f, Kernel density plots show the 


20 40 


Sea ice duration: 
<— Higher 
incidence of 
decreases 
inside AES 


Change in mean ice duration (days) 
-40 -20 0 


Outside AES | Inside AES 
n = 466,838 
Z = 2.99, p = 0.003 


e 

aid 
~ F Sea surface 
Go temperature: 
5 Slightly warmer 
yn outside AES 
c Oo 
oO 
oO 
& 
£ Ww 
o oO 
fo T 
=| 
oO 
= 
ow 

7 ' 

Outside AES | Inside AES 
n= 1,081,581 
Z = 67.9, p < 0.001 
f 


Wind speed: 
Slightly stronger 
outside AES 


Change in mean wind speed (m/s) 
0 


Outside AES | Inside AES 
n= 1,119,306 
Z= 112.14, p < 0.001 


distribution of values of each of a—c inside (red) and outside (grey) AESs. 
Horizontal lines represent zero change. Two-tailed permutation tests indicate 
significant differences in each case, and the number of grid cells included inthe 
testis givenin each case (n). 
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shows the proportion of climate models that indicate loss (orange) orretention according tothe eight different climate models (black points), and the mean of 
(blue) of AESs. Similarly, the gradation from white to green shows the these (red points). Box plots asin Fig. 4. 
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Extended Data Fig. 8 | Comparison of unweighted and weighted overall taken into account. Black points indicate colony locations for the 14 colony- 
habitat importance. a, Overall habitat importance, calculated without breeding species; white contours indicate AESs. See Methods and 


accounting for colony sizes. b, Overall habitat importance if colony sizes are Supplementary Information for details. 
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Extended Data Fig. 9 | Dendrogram of hierarchical cluster analysis showing 
species groups in the dataset. We performed UPGMA hierarchical cluster 
analysis onthe Manhattan distance among species, calculated from the 
habitat-importance scores. The results show two clear species groups: 
Antarctic (blue) and sub-Antarctic (magenta). Humpback whales and southern 
elephant seals (orange) did not fall into either group and we assigned them to 
both groups for subsequent analyses. The cophenetic correlation coefficient 
between the distance matrix and the dendrogram was 0.86, which means that 
the dendrogram isa good representation of the Manhattan distance values 
among the species. Values can range from 0 (no correlation) tol (perfect 
correlation). 
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Extended Data Fig. 10 | Mean habitat importance of Antarctic and sub- 
Antarctic species. a, b, To account for regional differences in species richness 
we defined two species groups (Methods and Extended Data Fig. 5) and 
calculated the mean habitat importance for these two groups separately. These 
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two mean habitat-importance layers (aand b) were then combined intoa single 
overall habitat-importance layer by choosing the maximum value in each cell. 
Black points indicate the colony locations of colony-breeding species in each 
species group. 
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Prostate cancer is the second most common cancer in men worldwide’. Over the past 
decade, large-scale integrative genomics efforts have enhanced our understanding of 


this disease by characterizing its genetic and epigenetic landscape in thousands of 
patients”*. However, most tumours profiled in these studies were obtained from 
patients from Western populations. Here we produced and analysed whole-genome, 
whole-transcriptome and DNA methylation data for 208 pairs of tumour tissue 
samples and matched healthy control tissue from Chinese patients with primary 
prostate cancer. Systematic comparison with published data from 2,554 prostate 
tumours revealed that the genomic alteration signatures in Chinese patients were 
markedly distinct from those of Western cohorts: specifically, 41% of tumours 
contained mutations in FOXA1 and 18% each had deletions in ZNF292 and CHD1. 
Alterations of the genome and epigenome were correlated and were predictive of 
disease phenotype and progression. Coding and noncoding mutations, as well as 
epimutations, converged on pathways that are important for prostate cancer, 
providing insights into this devastating disease. These discoveries underscore the 
importance of including population context in constructing comprehensive genomic 


maps for disease. 


Prostate cancer remains the second most common cancer in men world- 
wide, with more than 1,275,000 new diagnoses and 350,000 deaths 
annually’. It is characterized by a long and variable natural history, 
extensive intra- and inter-tumour heterogeneity, and diverse clinical 
behaviour*. Our understanding of the genomic definition and molecular 
complexity of prostate cancer has markedly improved over the past dec- 
ade, owing to the advent of next-generation sequencing-based technolo- 
gies and integrative genomics. Large consortia, including The Cancer 
Genome Atlas (TCGA), have profiled the molecular signatures of both 
primary and metastatic castration-resistant prostate cancer, provid- 
ing insights into the disease and an invaluable community resource”. 

However, until now, most prostate cancer genomics data have been 
derived from Western populations”** ” (Supplementary Data 1), 
although ethnic and racial background can profoundly influence the 
disease’’. The incidence and mortality rates of prostate cancer for 
Asians and Pacific Islanders are lower than those of the general US 
population’. In addition, Asian patients with prostate cancer often 
present with higher tumour grades at diagnosis, but have similar or 
better prognosis with androgen deprivation therapy’. Ina recent pilot 


study, we found that Chinese patients with prostate cancer featured 
genomic abnormalities that were distinct from those of Western 
patients”. Therefore, we sought to define the genomic underpinnings 
of prostate cancer in Western and Chinese men. 

Here, we deliver the first, to our knowledge, Chinese Prostate Can- 
cer Genome and Epigenome Atlas (CPGEA). Along with a new cohort 
of more than 200 Chinese men, we integrated existing datasets from 
2,554 patients with prostate cancer representing 13 Western cohorts, 
as wellas our pilot Chinese cohort”? >”. Our study revealed markedly 
different distributions of genetic lesions from those established by the 
TCGA’ and defined genomic signatures both unique to Chinese popu- 
lations and common to Chinese and Western disease. We also defined 
epimutation patterns for Chinese prostate cancer and illustrated the 
interaction between genetic mutations and epimutations. Finally, we 
highlighted that coding and noncoding mutations and epimutations 
converge on common pathways that underscore the biology of prostate 
cancer. Our study illustrates a paradigm for comprehensive cancer 
genome atlases in which population-specific contexts are taken into 
consideration. 


A list of affiliations appears at the end of the paper. 
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Fig. 1| Molecular landscape of the CPGEA cohort. Each column represents an 
individual tumour (n= 208). Patients were separated into three groups: those 
without treatment (left), with pre-treatment (centre, n=16) and with 
metastatic cancer (right, n=18). The top panel shows clinical features, 
including PSA level, Gleason score (GS), T-category, and age, as per the colour 


Clinical samples and data generation 

We analysed primary tumour tissue and matched healthy tissue from 
208 patients who underwent radical prostatectomy for clinically 
localized prostate cancer. Confirmation procedures, age, levels of 
prostate-specific antigen (PSA), Gleason scores, and other clinical and 
pathological characterizations are described in Extended Data Fig. 1a, 
Supplementary Data 2 and Supplementary Discussion. We character- 
ized all samples with whole-genome sequencing (WGS), whole-genome 
bisulfite sequencing (WGBS), RNA sequencing (RNA-seq) and microRNA 
sequencing (miRNA-seq) for a total of 1,268 datasets (Extended Data 
Fig. 1b). Treatment-naive tumours (177 out of 208) were used in the 
integrative and comparative analysis (Fig. 1), with the exclusion of two 
outliers (Supplementary Discussion). The study populations and results 
are summarized in a supporting website (http://www.cpgea.com). 


Somatic mutation landscape 
To compare the CPGEA cohort with previously profiled populations, 
we defined tumour mutation burden, mutation signature, and copy 
number variation, as well as identifying significantly mutated genes 
and recurrently mutated noncoding regions, using custom bioinfor- 
matics pipelines based on the TCGA’ and Pan-Cancer Analysis of Whole 
Genomes (PCAWG)’ pipelines (Methods). To ensure meaningful com- 
parison between Chinese and Western prostate cancer cohorts, we 
reprocessed the raw TCGA data using our pipelines. The results were 
strongly concordant with published TCGA results and recapitulated 
all major genomic signatures of Western primary prostate cancer 
(Extended Data Fig. 1c-f, Supplementary Discussion, Supplementary 
Data 3). Therefore, most comparisons between CPGEA and TCGA were 
based on uniformly processed data using the CPGEA pipelines, and all 
other comparisons were kept at a high level. 

Across the CPGEA cohort, the median substitution rate was 1.4 per 
megabase (Mb) and the mutation burden was 1.0 per Mb, confirming the 
low mutation burden observed in prostate cancer in Western cohorts” 
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key. Each subsequent panel displays a specific molecular profile: CNA 
segments per patient; number of structural variations (SV); number of gene-to- 
gene fusions; coding and noncoding mutations; number of hyper- and 
hypomethylated CpGs; and the fraction of each Alexandrov signature inthe 
genome. ND, not determined. Asterisks indicate no data available. 


(Figs. 1, 2a). Alexandrov signatures” 1, 3, 5,8, 9 and 16 were predominant 
inmost samples and comprised 78% of mutations (Fig. 1, Extended Data 
Fig. 1g, h). Signatures 8 and 16 strongly correlated with Gleason score. 

Prostate cancer is characterized by genomic instability, manifesting 
as recurrent copy number alterations (CNAs)” and DNA rearrange- 
ments”. We detected 20,375 copy number gains and 11,187 losses across 
the CPGEA cohort (Fig. 1), including 14 recurrent gains and 17 losses 
involving 921 amplified and 1,373 deleted genes (Fig. 2b, Supplementary 
Data 4). Consistent with previous studies””, the tumours clustered into 
three groups based on CNA frequency (Extended Data Fig. 2a), and CNA 
burden was a prognostic biomarker for biochemical recurrence (BCR) 
(Extended Data Fig. 2b). However, Chinese and Western prostate cancer 
exhibited some differences in CNA frequency (Fig. 2b, c). For example, 
PABPC1and YWHAZwere more often amplified in CPGEA (5.8% versus 
0.88% in TCGA, P= 0.04, Pearson’s x test), whereas CHD1 was more 
often deleted (17.8% versus 4.4%, P=3 x 10“) (Fig. 2d, Extended Data 
Fig. 2c, Supplementary Data 3). Known lesions in DNA repair pathways 
did not explain these patterns of structural variations”, but the CHD1 
deletion was associated with intra-chromosomal changes, reflecting its 
role in genome stability® (Extended Data Fig. 2d). These CNAs broadly 
influenced 12 oncogenic pathways (http://www.cpgea.com). 

In addition, we detected 34,598 somatic structural variations, 5,144 
of which were inter-chromosomal (Fig. 1, Extended Data Fig. 3a, b). 
The recently reported high-frequency structural variation involv- 
ing an androgen receptor (AR) enhancer” was undetectable in our 
cohort, although whether it is present in more advanced, metastatic 
prostate cancer remains to be determined. We identified the muta- 
tional processes chromothripsis in 49% of our cohort (101 out of 208) 
and chromoplexy in 42% (87 out of 208), which was comparable 
to the rate identified in Western primary cancer (45%, PCAWG)°”° 
(Extended Data Fig. 3a, Supplementary Data 5). Finally, we identified 
potential driver events stemming from structural rearrangements 
(Extended Data Fig. 3c—e, Supplementary Data 5). 

From the RNA-seq data, we detected 382 gene-to-gene fusions, 4 of 
which were previously reported in prostate cancer, 73 in other cancers, 
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Fig. 2 | The genomic alteration landscape in CPGEA and TCGA. 

a, Distribution of mutation burdens in each cohort. Each dot correspondstoa 
mutation burden calculated from atumour-normal pair. The red horizontal 
bar indicates the median mutation burden for the CPGEA (1.00 per Mb) and 
TCGA (0.70 per Mb) cohorts. b, Genomic regions with significantly recurrent 


and 305 of which were novel (Extended Data Fig. 4a). There were 71 inter- 
chromosomal fusions. We found support for 132 (34.6%) of the fusion 
candidates in matched WGS data, and validated 79 out of 88 selected 
candidates (90%) using Sanger sequencing (Extended Data Fig. 4b, Sup- 
plementary Data 6, http://www.cpgea.com). In stark contrast to Western 
prostate cancer, the hallmark ETS fusion (53%)? was much less frequent 
(9% of CPGEA tumours) (Fig. 2d, Extended Data Figs. 3a, 4c). Instead, 
the top gene fusions in Chinese prostate cancer were SCHLAPI-UBE2F3 
(29%) and PAOX-MTGI1 (10%) (Extended Data Fig. 4d). SLC45A2-AMACR 
was identified in 7% of CPGEA tumours, versus 15% in progressive pros- 
tate cancer’ (Extended Data Fig. 4e). We also identified and validated 
arare SNDI-BRAF fusion”® (Extended Data Fig. 4f). 

We defined 83 significantly mutated genes in the CPGEA cohort, 
including SPOP, FOXA1, KDM6A and ZMYM3, as well as new prostate 
cancer driver genes (Extended Data Fig. 4g-i). There were 625 genes 
significantly differentially mutated between Chinese and Western 
primary prostate cancer (Extended Data Fig. 4j, Supplementary Data 3). 

Finally, we defined the spectrum of noncoding mutations, obtain- 
ing 41,109 potentially functional noncoding single nucleotide vari- 
ants (SNVs) (Extended Data Fig. 5a). Recurrent noncoding mutations 
occurred at afrequency comparable to that of driver coding mutations, 
and more than half were within regulatory elements, including 7.8% in 


somatic CNAs. c, Heat map showing genome-wide CNAs with estimated actual 
copy numbers. d, Gene-level alteration frequencies in CPGEA and TCGA. 

The same pipeline used for the CPGEA cohort was used to predict genomic 
alterations for the TCGA cohort from raw sequencing data. 


promoters and 54% in enhancers” (Extended Data Fig. 5b). Recurrent 
mutations targeted the enhancers of 184 genes, including 7BL1XR1, 
FOXA1 and FLI/I (Extended Data Fig. 5c—h, Supplementary Data 7). 
Noncoding mutations resulted in the gain or loss of binding sites for 
20 or 97 transcription factors, respectively (Extended Data Fig. Se, 
Supplementary Data 7). For example, the mutation associated with 
TBL1XR1 disrupted a predicted NRF1-binding site, which correlated 
with lower expression of 7BL1XR1 in affected tumours (Extended Data 
Fig. 5f). However, although genome-wide association studies routinely 
suggest that most important variants are in noncoding regions, only 
mutations in the TERT promoter have been proven to drive cancer”. 
Despite state-of-the-art pipelines (Methods), we were also unable to 
confirm that any of the somatic noncoding mutations was a cancer 
driver. Inagreement with PCAWG*, we conclude that a larger sample 
size and more sophisticated analytical paradigms are needed to com- 
prehensively catalogue noncoding driver mutations, which are likely 
to have both a small and complex effect. 


FOXA1 mutation in Chinese prostate cancer 


FOXALis a pioneer factor that targets AR and has a demonstrated 
role in the oncogenesis of prostate cancer. High levels of FOXA1are 
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Fig. 3 |FOXA1 mutations in CPGEA. a, Mirror distribution plot of FOXA1 
somatic SNVs in CPGEA primary (blue, top), Western primary (blue, bottom), 
and Western metastatic (orange, bottom) cancers. Asterisks indicate 
mutations found in both CPGEA and Westerncohorts. The hash symbol (#) 
indicates mutations also identified in a pre-treated CPGEA patient. FH, 
forkhead. b, FOXA/ expression level as a function of FOXA1 mutation type. 


associated with poor prognosis, whereas low FOXAI levels, even inthe 
presence of high AR expression, predict good prognosis*‘. Notably, 
FOXA1 was the most highly mutated gene in our Chinese cohort (41%) 
(Figs. 2d, 3a). By contrast, FOXA1 was mutated in only 4% of TCGA pros- 
tate cancer and in 8-9% of primary prostate cancer and 12-13% of meta- 
static prostate cancer** >* in other cohorts (Fig. 2d, Supplementary 
Data 3). Intotal, 26 of the CPGEA FOXA/ mutations were missense and 63 
were insertions or deletions (indels), 13 of which resulted ina frameshift 
(Fig. 3a, Supplementary Data 8). All were validated by Sanger sequenc- 
ing and RNA-seq analysis (Extended Data Fig. 6a, Supplementary 
Discussion). Proteomic profiling of one tumour witha FOXA1 in-frame 
deletion detected a peptide confirming the deletion (Extended Data 
Fig. 6b). 

The mutational spectrum of FOXA1 in tissue samples from Chinese 
and Western patients exhibited notable differences (Fig. 3a). In tissue 
samples from the Western cohort, mutations covered the entire coding 
sequence, although there was a hot spot immediately after the fork- 
head domain, which mediates DNA binding (Extended Data Fig. 6c). 
By contrast, almost all mutations in samples from the Chinese cohort 
occurred within the hot spot. This region may mediate the FOXA1-AR 
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Pvalues determined by two-sided Mann-Whitney U-test.c, ARscoreasa 
function of FOXA1 mutation type. b, c, Patients treated before surgery are 
labelled by shape. d, Mutated FOXA/allele expression. Each mutationis 
represented by acircle positioned by its DNA mutation allele frequency (MAF) 
(xaxis) and RNA MAF (yaxis). The size of the circle indicates statistical 
significance (negative log-transformed Pvalue). 


interaction’, which suggests that FOXA1 mutations in Chinese patients 
may drive oncogenesis by modulating AR signalling. 

Two recent studies characterized the molecular mechanism of many 
of the FOXA1 mutations found here, providing direct support for their 
oncogenic capacity*”*®. Most missense and in-frame indel mutations 
were activating mutations that targeted the wing-2 region and enabled 
enhanced chromatin mobility and binding frequency. The frameshift 
mutations truncated the C-terminal domain and increased FOXA1 bind- 
ing affinity, expanded the target site repertoire, and abolished the 
TLE3-mediated WNT pathway. All of the mutations promoted growth 
and enhanced FOXAI binding to AR binding regions, while reducing 
AR binding. Accordingly, we observed higher FOXAI1 expression in 
tumours with FOXAI mutations than those with the wild-type gene 
(Fig. 3b). Tumours with FOXA1 frameshift deletions had the highest 
expression, which translated to the highest AR score? (Fig. 3c). Inaddi- 
tion, FOXA1 mutantalleles were almost always more highly expressed 
than the wild-type allele in patients with both (Fig. 3d). 

Consistent with its role as a pioneer factor, known FOXA1-binding 
regions remained hypomethylated in both normal tissue and tumours 
(Extended Data Fig. 6d). However, we also observed a statistically 
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Fig. 4 | DNA methylation abnormalities and subtypes of the CPGEA prostate 
cancer cohort. a, Distribution of CpG methylation levels in tumours and 
matched normal samples. Thin lines represent 187 pairs of normal and tumour 
samples; thick lines represent the average across each group. 

b, Average CpG methylation levels in PMDs asa function of Gleason score for 
each patient. Pvalues determined by one-sided Mann-Whitney U-test. Box 
plots show median levels and the first and third quartiles, and whiskers show 
1.5x the interquartile range. Each dot indicates atumour.c, Correlation of 
CIMP-CGI methylation with PMD methylation (Pearson’s correlation 


significant reduction in DNA methylation in tumours with frameshift 
FOXAI1 deletions over FOXA1 binding motifs that were normally hyper- 
methylated and cryptic binding sites specific to mutated FOXA1*®. 
These results suggest that specific FOXAI mutations could result ina 
gain-of-function oncogenic protein that potentially activates ectopic 
binding sites. 

Most FOXA1 mutations in Chinese prostate cancer were clonal based 
onthe cancer cell fraction score’ (Extended Data Fig. 6e-h), under- 
scoring the likelihood that these mutations were driver events. We also 
examined the pairwise co-occurrence or mutual exclusivity of major 
genetic alterations using SELECT” (Supplementary Data 8). Notably, 
FOXAI1 was mutually exclusive with ETS fusions. We also found that 
FOXA1 mutation significantly co-occurred with CHD1 deletion anda 
CECR7-IL17RA gene fusion. 

To dissect the downstream effects of FOXA1 mutations, we exam- 
ined differential gene activities based on FOXA1 mutation patterns. 
For instance, the nicotinate and nicotinamide metabolism pathway 
was differentially expressed between the in-frame indel and missense 
groups (Extended Data Fig. 6i). Some differentially affected genes are 
directly implicated in metabolizing chemicals involved in androgen 
deprivation therapy”, such as HSD17B6, which catalyses the conversion 
of androstanediol to dihydrotestosterone. Expression of HSD17B6 was 
much higher in tumours with missense FOXA/ mutations than those 
with frameshift deletions. Thus, FOXA1 mutation patterns might be an 
important indicator of the efficacy of androgen deprivation therapy. 


DNA methylation abnormalities 

To complement our analysis of genetic alterations in prostate cancer, we 
profiled DNA methylation in the CPGEA cohort using WGBS (Extended 
Data Fig. 7a). DNA methylation was profiled using probe arrays in TCGA, 


coefficient r=-0.78, P=8.8 x10”). Each dot represents a sample. 

d, Correlations between epigenetic alterations and genetic alterations or 
clinicopathological features of tumours (n=187). Dot size and colour indicate 
the magnitude and direction of Spearman’s rank order correlation coefficient 
p, respectively. Pvalue indicated by background colour. e, CPGEA prostate 
cancer subtyping. iClusterPlus integration of three techniques defined four 
molecular subtypes for Chinese prostate cancer. Clinical features (top) and 
molecular data for 126 tumours (rows) are displayed as heat maps. 


such that our study represents the first, to our knowledge, joint analysis 
of genome-wide genetic mutation and epimutation rates in prostate 
cancer. As expected”, prostate cancer genomes were hypomethyI- 
ated relative to normal prostate tissue (Fig. 4a, Extended Data Fig. 7b, 
Supplementary Discussion), and 5’ untranslated regions (UTRs) and 
CpG islands (CGls) were relatively hypermethylated, whereas exons, 
introns and repetitive elements were hypomethylated (Extended 
Data Fig. 7c, d). By contrast, non-CG methylation was negligible and 
exhibited no significant difference between normal tissue and tumours 
(Extended Data Fig. 7e). 

Megabase-:scale partially methylated domains (PMDs)*” ** were 
widespread and accounted for the observed global hypomethylation, 
affecting up to half of the cancer genome (Extended Data Fig. 7f, g). 
One-third of these were recurrent in most tumours (Extended Data 
Fig. 7h). PMDs exhibited a significantly higher somatic mutation fre- 
quency and lower gene expression compared with non-PMD regions 
(Extended Data Fig. 7i,j). Notably, tumours with a Gleason score greater 
than six exhibited much lower levels of PMD methylation than those 
with a score of six (Fig. 4b), which suggests that tumour progression 
correlates with the degree of genome-wide hypomethylation. 

We further defined local differentially methylation regions (DMRs), 
including 96,037 hypomethylated DMRs (hypoDMRs), 1.2% of which 
were recurrent (shared by at least 10 tumours), and 17,131 hypermethyI- 
ated DMRs (hyperDMRs), 25% of which were recurrent (Extended Data 
Fig. 8a, b, Supplementary Discussion). Both sets were significantly 
enriched in promoters and enhancers (Extended Data Fig. 8c), with 
19% and 44% of recurrent hypo- and hyperDMRs overlapping promot- 
ers, respectively (Extended Data Fig. 8d). Recurrent hyperDMRs were 
associated with genes involved in the regulation of development and 
cell fate, whereas hypoDMRs were associated with genes that were 
upregulated in prostate cancer, in human prostate adenocarcinoma 
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LNCaP cells, and in cells exposed to androgen (Extended Data Fig. 8e, f). 
Hypermethylated promoters silenced the expression of well-known 
tumour-suppressor genes, including miRNAs (Extended Data Fig. 8g). 
In total, 289 CGls displayed significant hypermethylation consistent 
with the CpG island methylator phenotype (CIMP) (Extended Data 
Fig. 8h-j). Notably, the DNA methylation level of CIMP CGls negatively 
correlated with the PMD methylation level (Fig. 4c). CIMP* tumours 
had a shorter BCR (Extended Data Fig. 8k). 

Finally, we calculated the epimutation burden as the fraction of 
CpGs that were significantly differentially methylated between each 
tumour and its paired normal sample (0.013 to 0.45). The epimuta- 
tion burden was positively correlated with mutation burden and CNA 
burden (Fig. 4d, Extended Data Fig. 81, m). A concordance of genetic 
and epigenetic alterations was also observed during tumour evolution 
in individual patients*, and we now confirm this phenomenon across 
tumours ina large population. 


Molecular subtypes of Chinese prostate cancer 


Next, we defined molecular subtypes for the CPGEA prostate cancer 
cohort. The seven molecular classes defined by TCGA using oncogenic 
drivers’ were all present in the Chinese population, although at very 
different proportions (Extended Data Fig. 9a). Integrative analysis 
with iCluster on CNA, gene expression and DNA methylation data 
defined four subtypes within the CPGEA patients (Fig. 4e), three of 
which (subtypes B-D) correspond to subtypes identified by TCGA in 
Western populations (Extended Data Fig. 9b-e). Subtype A was unique 
to CPGEA and was characterized by numerous CNAs affecting genes, 
including RB1, HDAC2 and ZNF292, which was previously associated 
with an ERG fusion-negative pattern**®. Subtype A also exhibited high 
expression of AR and related pathways and was the only subtype with 
ZNF365 mutations. Although FOXA1 mutation did not segregate cleanly 
with any subtype, more than half of the patients in subtypes A (60%) 
and D (56%) had a FOXAI mutation, compared to 35% in subtype B and 
44% in subtype C. 

Clustering using individual data types mostly recapitulated the 
subtypes, including miRNA expression, which was not included in 
the integrative analysis (Fig. 4e, Extended Data Figs. 2a, 9f-h, Sup- 
plementary Discussion). 


Convergence on common oncogenic pathways 


Genomic alterations are known to target common cancer pathways, 
even though component genes are not altered at equal frequency. 
We next compared the mutational landscape between Chinese and 
Western prostate cancer in a pathway-centric manner, focusing on 
12 important pathways (Extended Data Fig. 10a—c, Supplementary 
Data 9, Supplementary Discussion, http://www.cpgea.com). The over- 
all pathway-level mutation burden was similar between Chinese and 
Western primary tumours, whereas metastatic tumours exhibited 
a much higher burden (Extended Data Fig. 10d). Notably, although 
AR itself was unaltered in the CPGEA primary prostate cancer cohort, 
other genes in the AR pathway were altered in 61% of Chinese patients, 
versus 84% in Western metastatic prostate cancer, in which direct altera- 
tions of the AR gene dominate (Extended Data Fig. 10d). In addition, 
we observed repeated noncoding and epimutations in these pathways, 
supporting the paradigm that noncoding alterations can contribute to 
pathway-level differential expression (Extended Data Fig. 10e, http:// 
www.cpgea.com). Despite different patient cohorts, experimental 
and sequencing technologies, and molecular alterations, the same 
key pathways emerged as crucial for prostate cancer in the Western 
and Chinese cohorts. 

Although pathway disturbance could potentially explain 85.4% of the 
Chinese prostate cancer cases (Extended Data Fig. 10d), weidentified 
few druggable pathways. Querying the OncoKB knowledge base”, 
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we found no level 1 or 2A potentially actionable alterations in either 
the Chinese or the Western cohorts. Only 5.3% of patients containeda 
lower level potentially actionable mutation (Extended Data Fig. 10f), 
highlighting the considerable challenge in treating this deadly disease. 


Conclusions 


In this study, we present a comprehensive atlas of prostate cancer in 
Chinese men (CPGEA), which deepens our understanding of the disease 
by incorporating ethnic background. Comparative analysis of samples 
from CPGEA and Western cohorts revealed marked disparities in the 
mutational landscape of the same disease. Although ETS fusions have 
long been regarded as the flagship mutation for prostate cancer, our 
study indisputably positions FOXA1 mutations as the most prominent 
signature in Chinese populations. The frequency and unique pattern of 
FOXA1 mutations in Chinese prostate cancer underscores the need to 
investigate the oncogenic mechanism of individual mutations””**3", 
as wellas factors that predispose Chinese individuals to them. In addi- 
tion, a lack of targetable genetic mutations in either population sug- 
gests that future investigations should focus on understanding the 
tumorigenic potential of noncoding mutations, structural variations 
and epimutations and ontranslating this knowledge into therapeutic 
interventions. Answers to these questions could improve targeted 
therapy and prevention in the era of precision medicine while increas- 
ing global health equity. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized, and investigators were not blinded 
to allocation during experiments and outcome assessment. 


Pathological evaluation of specimens 

The CPGEA project was initiated in 2010. Ethical committee approval 
was obtained from Changhai Hospital (TMEC2014-001). Written 
informed consent was obtained in accordance with Chinese leg- 
islation. In total, 210 prostate tumour samples and non-cancerous 
matched healthy prostate tissue were collected from patients surgi- 
cally treated in the Urology Department of Changhai Hospital. Allsam- 
ples were immediately frozen in liquid nitrogen and stored at —80 °C. 
Thin slices of snap-frozen, OCT-embedded tissue blocks were sent 
for haematoxylin and eosin (H&E) staining. After independent review 
by two professional uropathologists (Y. Yu and Y. Zhu), DNA and RNA 
were extracted from the same tissue blocks. Gleason score, tumour 
purity, extraprostatic extension, surgical margin and seminal vesicles 
were all evaluated according to EAU guidelines (https://uroweb.org/ 
wp-content/uploads/09-Prostate-Cancer_2017_web.pdf). 

All patient clinical information was deposited in our follow-up 
database PC-Follow®. The concentrations of PSA (ng mI“) inserum 
for each patient were measured at the time of diagnosis. We used the 
pathological T and N categories and the clinical M category to assign 
final stage information to each patient using standard National Com- 
prehensive Cancer Network (NCCN) criteria (http://www.nccn.org/ 
professionals/physician_gls/pdf/prostate.pdf). BCR was defined as 
arise in the blood level of PSA to two consecutive measurements of 
0.2 ng mI” or greater after treatment with surgery or radiation. Bone 
metastases were defined using emission computed tomography (ECT). 
Detailed clinicopathological data are available in Supplementary Data 2. 
All tumours with metastases were hormone-:sensitive. 


DNA and RNA isolation, quantification and qualification 
Genomic DNA was isolated using the Qiagen MinElute Kit. Quality was 
verified by the following two methods: (1) DNA degradation and con- 
tamination were monitored on 1% agarose gels; (2) DNA concentration 
was measured using the Qubit DNA Assay Kit and a Qubit 2.0 Fluorom- 
eter (Life Technologies). 

RNA was isolated in TRIzol reagent. Quality was verified by the 
following four methods: (1) RNA degradation and contamination was 
monitored on 1% agarose gels; (2) RNA purity was checked using the 
NanoPhotometer spectrophotometer (IMPLEN); (3) RNA concentra- 
tion was measured using the Qubit RNA Assay Kit and a Qubit 2.0 Fluo- 
rometer (Life Technologies); (4) RNA integrity was assessed using the 
RNA Nano 6000 Assay Kit and the Bioanalyzer 2100 system (Agilent 
Technologies). 


WGS library generation 

Atotal of 0.5 pg of genomic DNA per sample was used as input material 
for library preparation. Sequencing libraries were generated using the 
TruSeq Nano DNA HT Sample Prep Kit (Illumina) following manufac- 
turer’s recommendations, and index codes were added to eachsample. 
In brief, genomic DNA was fragmented by sonication toa median size of 
350 bp. Then, DNA fragments were end-repaired, A-tailed, and ligated 
with the full-length Illumina sequencing adapters, followed by further 
PCR amplification. PCR products were purified (AMPure XP system), 
and libraries were analysed for size distribution using an Agilent Bio- 
analyzer 2100 and quantified via real-time PCR. 


WGBS library generation 
A total of 5.2 ug of genomic DNA per sample plus 26 ng of spiked-in 
lambda DNA were fragmented via sonication to 200-300 bp witha 


Covaris S220, followed by end repair and A-tailing. Cytosine-methylated 
barcodes were ligated to sonicated DNA as per manufacturer’s instruc- 
tions. DNA fragments were then treated twice with bisulphite using 
the EZ DNA Methylation-Gold Kit (Zymo Research), and the resulting 
single-stranded DNA fragments were PCR amplified using KAPA HiFi 
HotStart Uracil + ReadyMix (2x). Library concentration was quantified 
using a Qubit 2.0 Fluorometer (Life Technologies) and quantitative PCR, 
and the insert size was assayed onan Agilent Bioanalyzer 2100 system. 


RNA-seq library generation 

A total of 2 pg of RNA per sample was used as input material. Riboso- 
mal RNA was removed using an Epicentre Ribo-zero rRNA Removal Kit 
(Epicentre), and rRNA-free residue was cleaned up via ethanol precipi- 
tation. Subsequently, sequencing libraries were generated using the 
rRNA-depleted RNA using a NEBNext Ultra Directional RNA Library 
Prep Kit for Illumina (NEB) following manufacturer’s recommenda- 
tions. In brief, fragmentation was carried out using divalent cations 
under elevated temperature in the NEBNext First Strand Synthesis 
Reaction Buffer (5x). First strand cDNA was synthesized using random 
hexamer primers and M-MuLV Reverse Transcriptase (RNase H minus). 
Second-strand cDNA synthesis was subsequently performed using DNA 
polymerase land RNase H. Inthe reaction buffer, dNTPs with dTTP were 
replaced by dUTP. Remaining overhangs were converted into blunt 
ends via exonuclease/polymerase treatment. After 3’ adenylation, 
NEBNext Adaptors with a hairpin loop structure were ligated to the DNA 
fragments to prepare for hybridization. To preferentially select cDNA 
fragments of 150-200 bp, the library fragments were purified using an 
AMPure XP system (Beckman Coulter). Size-selected, adaptor-ligated 
cDNA was treated with 3 pl of USER Enzyme (NEB) at 37 °C for 15 min 
followed by 5 min at 95 °C before PCR. PCR was performed with Phu- 
sion High-Fidelity DNA polymerase, Universal PCR primers, and Index 
(X) Primer. At last, products were purified (AMPure XP system), and 
library quality was assessed on the Agilent Bioanalyzer 2100 system. 


miRNA-seq library generation 

A total of 3 pg of total RNA per sample was used as input material fora 
small RNA library. Sequencing libraries were generated using the NEB- 
Next Multiplex Small RNA Library Prep Set for Illumina (NEB) following 
manufacturer’s recommendations, and index codes were added to 
associate sequences with each sample. In brief, 3’ SR Adaptor (NEB) was 
ligated to the 3’ end of small RNA. The SR RT Primer (NEB) was hybrid- 
ized to excess 3’ SR adaptor, and the 5’ end adaptor was then ligated. 
First-strand cDNA was synthesized using M-MuLV Reverse Transcriptase 
(RNase H minus). PCR amplification was performed using LongAmp 
Taq 2x Master Mix, SR Primer for Illumina, and Index (X) primer. PCR 
products were purified on an 8% polyacrylamide gel (100 V, 80 min). 
DNA fragments corresponding to 140-160 bp (the length of a small 
noncoding RNA plus the 3’ and S’ adaptors) were recovered and dis- 
solved in 8 pl of elution buffer. Finally, library quality was assessed on 
the Agilent Bioanalyzer 2100 system using DNA High Sensitivity Chips. 


Generation of sequencing data and quality control 

Clustering of the indexed samples was performed on a cBot Clus- 
ter Generation System using a HiSeq X PE Cluster Kit V2.5 (Illumina) 
according to the manufacturer’s instructions. WGS (208 tumour/ 
normal sample pairs), WGBS (187 pairs), and RNA-seq (134 pairs) librar- 
ies were sequenced on the Illumina HiSeq X TEN platform (2x150-bp 
paired-end reads). 50-bp single-end reads for miRNA-seq (105 tumour 
and normal sample pairs) were also generated on the Illumina HiSeq 
X TEN platform. 

Read pairs were discarded if (1) either read contained adaptor 
sequences (>10 nucleotides aligned to the adaptor, allowing < 10% 
mismatches); (2) either read contained more than 10% uncertain 
bases; or (3) either read contained more than 50% low quality bases 
(Phred quality < 5). The following statistics were calculated (and are 


available at http://www.cpgea.com): total number of reads; sequenc- 
ing error rate; percentage of reads with average quality score > 20; 
percentage of reads with average quality score >30; and GC content 
distribution. 


WGS processing 

Sequencing reads were aligned to the Human Genome Reference 
Consortium build 38 (GRCh38) using BWA v.0.7.8 (BWA-mem). Una- 
ligned reads that passed IIlumina’s quality filter (PF reads) were 
retained. We used the ‘Picard’ pipeline (http://broadinstitute.github. 
io/picard) to combine data from multiple libraries and flow cell runs into 
asingle BAM file per sample. Only uniquely aligned, de-duplicated reads 
were used in subsequent analyses. Quality scores were recalibrated 
using the Genome Analysis Toolkit (GATK) Table Recalibration tool. 
All sites potentially containing small insertions or deletions in either 
the tumour or the matched normal were realigned using GATK. The 
sample cross-individual contamination levels were estimated using the 
Conpair program*. A total of 208 tumour-normal pairs from samples 
with contamination less than 5% (maximum 4.7%, minimum 0.4%) were 
included in downstream analysis. 


WGBS processing 

FastQC v.0.11.5 was used to estimate the quality of the raw reads. Reads 
were pre-processed with Trimmomatic v.0.36 using the parameters 
(SLIDINGWINDOW:4:15 LEADING:3 TRAILING:3 ILLUMINACLIP:adaptor. 
fa:2:30:10 MINLEN:36). Reads that passed all filtering steps were 
considered clean reads, and all subsequent analyses were performed 
on those reads. FastQC was used to perform basic statistics on the 
quality of the clean reads. 

Bismark™ v.0.16.3 was used to align bisulphite-treated reads toa 
human bisulfite-converted reference genome (-X 700 --dovetail). The 
human reference genome was first transformed into a bisulfite-con- 
verted version (C-to-T and G-to-A converted) and then indexed using 
bowtie2°’. Sequence reads were also transformed into fully bisulfite- 
converted versions before they were aligned to the genome ina direc- 
tional manner. Reads that produced a unique best alignment from the 
two alignment processes (original top and bottom strand) were then 
compared to the normal genomic sequence, and the methylation state 
of each cytosine position in the read was inferred. Reads aligned to 
the same genomic region were considered duplicates, and sequenc- 
ing depth and coverage were summarized using de-duplicated reads. 

Methylation extractor results (bismark_methylation_extractor --no_ 
overlap) were converted to bigWig format for visualization using the 
IGV browser. The bisulfite conversion rate was calculated as the per- 
centage of thymine sequenced at cytosine reference positions in the 
lambda genome. 


RNA-seq processing 

RNA-seq reads were aligned to the Ensembl reference genome and 
gene model annotation files (release 84, http://ftp.ensembl.org/pub/ 
release-84/fasta/homo_sapiens/dna/ and http://ftp.ensembl.org/pub/ 
release-84/gtf/homo_sapiens/). A reference genome index was built 
using Bowtie v.2.0.6”, and paired-end clean reads were aligned to the 
reference genome using TopHat v.2.0.9%*. 

Raw count data per gene was calculated using HTSeq>. The raw 
count matrix was then used by DESeq2* to quantify gene expression 
level as normalized counts. Cuffdiff*’ was used to detect differentially 
expressed genes between tumours and normal samples™. Transcripts 
with an adjusted P< 0.05 were considered differentially expressed. 


miRNA-seq processing 

High-quality, 18-35-bp miRNA-seq reads were aligned to the reference 
genome using bowtie v.1.0.1. Aligned small RNA tags were matched to 
the known reference miRNA database, miRBase20.0, using mirdeep2 
v.1.1°° with the following modified parameters: -i-r-M-m-k-p 10 -g 


50000. srna-tools-cli was then used to obtain the potential miRNA 
and to draw the secondary structures. Small RNA tags originating 
from protein-coding genes, repeat sequences, rRNA, tRNA, small 
nuclear RNA and small nucleolar RNA were removed by mapping tags 
to RepeatMasker v.4.0.3 (-species -nolow -no_is -norna -pa 8), Rfam 
database. Known miRNA expression levels were estimated as TPM using 
the formula: normalized expression = (mapped read counts)/(total 
reads) x 1,000,000. 


Mutation calling 

The GATK HaplotypeCaller® was used to perform germline mutation 
calling. Somatic SNVs were detected using MuTect® v.1.1.4. A mini- 
mum of five variant-containing reads and a variant allele frequency 
(VAF) = 0.04 in the tumour were required for mutation calling. In 
addition, we used the 208 matched normal samples from this study 
to build the panel of normals (PoNs) and removed any mutation with 
acorresponding alternate allele appearing in >1 PON samples. Further 
filtering was performed using the fpfilter.pl script (https://github.com/ 
ckandoth/variant-filter) with default parameters and a VAF threshold 
of 4%. 

Because MuTect cannot call somatic indels, somatic indels were 
detected using Strelka® and Mutect2”. Only indels agreed on by both 
tools were retained. A minimum of five variant-containing reads and 
VAF > 0.04 inthe tumour were required for mutation calling. Any indel 
appearing in more than1PoN samples was removed. In total, 1,167,497 
somatic mutations were included in our final set. 

ANNOVAR® was used to annotate VCF (variant call format) files. 
To ensure that no candidate driver mutations were mistakenly removed 
by the post-processing filtering, candidate mutations in previously 
implicated cancer genes were manually reviewed using the IGV 
browser“. 


Mutation burden and substitution rate 

Mutation burden was defined as the total number of non-synonymous 
somatic coding mutations within exonic regions (36 Mb for CPGEA 
and 32.7 Mb for TCGA). Non-synonymous mutations were defined as 
missense, nonsense and nonstop mutations; splice site mutations; 
frameshift deletions and insertions; and in-frame deletions and inser- 
tions. The substitution rate was calculated as the number of somatic 
variants across the entire genome (3,257,319,537 bp), including coding 
and noncoding regions. Outliers (T13 and T502) were not included in 
substitution rate and mutation burden calculations. 


Somatic CNA detection 

Control-FREEC® v.6.7 was used to detect genomic segments with 
somatic CNAs from matched normal and tumour WGS mapped data. 
Genomic segments with frequent germline CNVs or intersecting black- 
list regions® were filtered out. The GISTIC2” algorithm was used to 
detect recurrently amplified or deleted genomic regions with the fol- 
lowing modified parameters: -ta 0.2 -td 0.2 -js 100 -broad 1 -brlen 0.7 
-conf 0.95 -genegistic 1 -savegene 1. In chromosome arm level analysis, 
chromosomal arms were considered altered if at least 60% of the arm 
was lost or gained witha relative log,-transfomred copy number change 
>0.1. The CNV level for all genes was extracted from the GISTIC output 
files (all threshold_by_genes) using a cutoff of + 2. 


CNA clustering, CNA burden and BCR 

Tumours were clustered based on chromosome arm-level alterations 
identified by GISTIC. Hierarchical clustering was performed in R based 
on Euclidean distance using Ward’s method. To calculate CNA burden, 
the total genomic length of CNA segments was summed and divided by 
the total genomic length of the autosomal chromosomes per tumour. 
The mean CNA burden of 11.28% was used to stratify the patients as 
described”. Biochemical recurrence-free survival was calculated 
from the date of surgery to the date of diagnosis of biochemical 
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recurrence. Differences in the BCR of patients in the two CNA bur- 
den groups were assessed using Kaplan-Meier analysis followed by a 
log-rank test. 


Somatic structural variation detection and validation 

To accurately detect somatic breakpoints, we used the MeerKat soft- 
ware with default parameters®. The set of structural variant calls from 
eachtumour was filtered by the calls from its matched normal to remove 
germ-line variants, as described®. 

Recurrent structural variation events affecting important genes were 
chosen for validation. The primers were designed using Meerkat’s prim- 
ers.pl function. Sanger sequencing was performed on PCR products, 
and the reads were mapped to the SV breakpoints using UCSC’s BLAT 
tool. The validation process is described in detail in the supporting 
website (http://www.cpgea.com). 


Evaluation of chromothripsis and chromoplexy 

Chromothripsis was evaluated by Shatterseek (https://github.com/ 
parklab/ShatterSeek). Chromoplexy was evaluated by ChainFinder 
v.1.0.1° using a deletion threshold of 0.278 anda significance threshold 
of 0.05. The presence of chromoplexy was defined by the presence of a 
chromoplexy chain connecting at least three chromosomes. 


Fusion detection 

Fusion gene refers to the event in which partial or complete sequences 
of two individual genes are fused together and result in a chimeric gene, 
usually caused by chromosomal translocation. We used SOAPfuse®” 
v.1.27 software to detect and analyse fusion genes. SOAPfuse filter- 
ing was applied with default parameters: junction reads = 1 and split 
reads = 1. In addition, the junction location and whether it was an in- 
frame or out-of-frame fusion were considered. To remove fusion genes 
present in normal samples, all fusion genes detected by SOAPfuse in 
normal samples were merged together into a PON. Ifa fusion breakpoint 
from a tumour sample was detected in the PoN, the corresponding 
fusion was removed. We compared fusion candidates to public fusion 
databases including FusionHub” and Oncofuse”, and annotated them 
in Supplementary Data 6. 

The Meerkat fusions.pI module was used to call fusions using WGS 
structural variation data, and the results were used to estimate the 
proportion of fusion candidates that resulted from genomic structural 
variation events. 


Fusion RT-PCR validation 

cDNA was synthesized from 300 ng of RNA using SuperScript Il reverse 
transcriptase (Life Technologies) according to the manufacturer’s 
instructions. Primer sequences for fusion validations are listed in the 
supporting website (http://www.cpgea.com). Reverse transcription 
PCR (RT-PCR) was performed using standard protocols, after which 
products were visualized ona1.2% agarose gel and purified using BigDye 
Terminator v.3.1 Cycle Sequencing Kits (Life Technologies). 


Genes associated with structural variation, and intersection of 
breakpoints with topologically associating domains 

The nearest-gene method was used to identify candidate genes affected 
by two different breakpoints. New structural variation events were 
identified in our cohort using the following method. First, events were 
categorized based on the location of the two breakpoints (Extended 
Data Fig. 3d): (1) tier 1, the two breakpoints hit the same gene (defined 
as 5’ UTR, promoter, coding exon, intron, or 3’ UTR); (2) tier 2, the two 
breakpoints hit two different genes; (3) tier 3, the first breakpoint hit 
a gene, and the second hit the intergenic region. If the gene was the 
closest gene to the second breakpoint within a100-kb region, then the 
structural variation was categorized as tier 3.1. If the closest gene was 
another gene, the variation was categorized as tier 3.2. If the nearest 
was an enhancer, the variation was categorized as tier 3.3; (4) tier 4, the 


two breakpoints both hit the intergenic region. If one of the nearest 
genes was located within 100 kb, it was categorized as tier 4.1. If both 
were within 100 kb, it was categorized as tier 4.2. Ifthe nearest genomic 
features for both breakpoints were enhancers, it was categorized as tier 
4.3. If one was close to an enhancer and the other was close to a gene, 
it was categorized as tier 4.4. If one was located in an enhancer andthe 
other was at least 100 kb away from any gene, then it was categorized 
as tier 4.5; and (5) tier 5, the two breakpoints were at least 100 kb from 
any gene. These annotations were used to look for functional events 
related to structural variation, suchas enhancer hijacking and disrup- 
tion of topologically associating domains (TADs). 

The TAD definitions for the LNCaP cell line were downloaded from 
http://promoter.bx.psu.edu/hi-c/publications.html. We intersected our 
structural variation breakpoints with TAD assignments. Structural vari- 
ation events with breakpoints located in different TADs were considered 
potential candidates for destroying the TAD boundary. The expression 
level of genes within the TADs was compared between tumours with 
and without the structural variation event (Extended Data Fig. 3c). 


Detection of significantly mutated genes 

To identify driver mutations and genes, we adopted previously 
described methods””. For mutation calling, we used the classic GATK 
toolbox and annotated mutations using ANNOVAR, as described in 
‘Mutation calling’. After filtering for artefacts and defining a final set of 
mutations, the MAF was analysed to determine significantly mutated 
genes (SMGs). This was accomplished using the MutSigCV“ and MuSiC” 
algorithms based on 206 samples””. Two outliers were not included 
inthis analysis. For the MuSiC SMG test, genes with an FDR<0.2intwo 
out of three tests were retained. For MutSigCV, g < 0.01 was used as a 
cut-off value. We carefully curated the SMGs in our study as follows. 
We compiled a blacklist according to the MutSigCV paper”. Further- 
more, genes were removed if they had an average reads per kilobase 
per million reads (RPKM) value less than 1.5. In our study, we not only 
compared our SMGs to those previously reported in 12 large Western 
cohorts, but also to two expert-curated databases (1,086 consensus 
cancer genes from Cancer Gene Census (CGC)” and OncoKB”). 


Comparison of mutation landscapes across cohorts of prostate 
cancer 

From 14 previous prostate cancer studies (13 Western cohorts and 
1Chinese cohort), we compiled a total of 2,641 non-redundant tumours, 
including 1,656 primary tumours and 880 metastatic tumours. 
Inaddition, 54 samples were prostate neuroendocrine carcinoma. We 
excluded 8 cell lines, 8 xenografts, and 35 samples with unclear defini- 
tion. Metadata for the Western cohorts is presented in Supplementary 
Data 1, and all codes are provided at http://www.cpgea.com. For CPGEA, 
206 samples were used, excluding 2 outliers. 

For mutations in protein-coding genes, non-synonymous coding 
mutations were counted per tumour and then divided into primary, 
metastatic or neuroendocrine subsets based on sample metadata. 
A Pearson’s x” test was used to evaluate the difference in mutation 
frequencies. The Benjamini-Hochberg method was used to correct 
Pvalues. 

For gene-level CNA comparisons, we converted gene IDs into the 
consistent gene symbol by removing duplicated and nonsense symbols. 
Eleven Western cohorts were used in this analysis (CPCG-2017 does 
not have CNA data). In addition, 208 CPGEA tumours and 114 TCGA 
WGS datasets were processed using the CPGEA pipeline and included 
in the comparison. For the cases used in several studies, we kept only 
one result. In total, 1,326 primary, 868 metastatic, and 54 neuroen- 
docrine prostate tumours were included in the comparison of gene- 
level amplification and deletion frequencies. GISTIC2 was also used to 
identify recurrent CNA genomic lesions. The CNA status of all genes 
was extracted from the GISTIC output files (all threshold_by_ genes) 
using a cutoff of +2. We annotated CNAs found both in CPGEA and in 


any Western cohort as ‘recurrent 1’, and CNAs found more than twice 
inthe Western cohorts but notin CPGEA as ‘recurrent 2’. We annotated 
CNAs found only in CPGEA as ‘novel’. These annotations can be foundin 
the column ‘class’ in Supplementary Data 3. A Pearson’s x’ test was used 
to evaluate differences in CNA frequency. The Benjamini-Hochberg 
method was used to correct Pvalues. We also annotated the genes with 
the CGC; OncokB, which called oncogenes and tumour-suppressor 
genes; and pathways curated from the literature. SMGs detected in 
this study were also annotated. The CNA circos plot includes 225 genes 
that were significantly altered between CPGEA and Western primary 
tumours (P<0.01) and that were either SMGs or annotated by the CGC 
or OncoKB databases. 


Noncoding mutation detection 

We used FunSeq2” to detect recurrent noncoding mutations. FunSeq2 
prioritizes noncoding mutations based on their relative location within 
regulatory elements, nucleotide-level affect, conservation, potential 
target gene, and recurrence across cancer samples. After choosing 
noncoding SNVs with a FunSeq2 score >1.5, we applied hotspot analysis, 
regional recurrence analysis, and transcription factor motif analysis”® 
tothe 41,109 potentially functional somatic noncoding SNVs, following 
previously described methods” (Extended Data Fig. 5). 


AR output score of FOXA1-mutant tumours 

The AR output score was calculated as previously described’. In brief, 
z-scores for 20 androgen-induced genes were computed by subtracting 
the pooled mean from the RNA-seq expression values and dividing by 
the pooled standard deviation. The sum of the z-scores for the AR sig- 
nalling gene signature represents the AR output score for each sample. 


Allele-specific expression of FOXA1 

For SNP mutations, WGS and RNA-seq read counts were extracted 
directly from the corresponding BAM files. For indel mutations, we 
constructed the de novo mutation in silico and extracted the corre- 
sponding read counts. The minimum coverage for both DNA and RNA 
was 20. Variants with |RNA_ MAF - DNA _MAF|> 0.2 and an FDR<0.01 
(calculated using R package q-value on the Pvalue from the two-sided 
Fisher’s exact test) were considered to showallele-specific expression. 


Clonal analysis of FOXA1 mutations 

The clonal status of FOXA1 mutations was determined using the cancer 
cell fraction (CCF) following the previously described method’. CCF 
was estimated as the proportion of cancer cells with an alteration, and 
the recommended threshold was used to separate clonal events and 
subclonal events. The algorithm takes into consideration VAF, tumour 
purity (p), andlocal copy number calls (CNyormar and CNiumour) to calculate 
CCF using the following formula: 


CCF = VAF x (CNyormat a (1 -p) af CNyumour x p)/p 


Then, a binomial distribution was used to estimate the probability 
of being clonal or subclonal, and the 95% confidence interval was cal- 
culated. For each variant, the alternative reads t and total depth R met 
the following binomial distribution: P(CCF) = binom(¢|R, VAF(CCF)). As 
previously described”, if the 95% confidence interval of CCF overlapped 
1, the variant was determined to be clonal; otherwise, the variant was 
determined to be subclonal. In total, 2 of the 90 FOXAI mutations from 
tumours T211 and T521 were excluded owing to the absence of local copy 
numbers for those samples, and the clonal status of 88 FOXA1 mutations 
was determined using the R package Hmisc and the binconf function. 


Crystal structure of mutant FOXA1 

The primary amino acid sequence of human FOXAI1 was obtained from 
the SWISS-PROT protein sequence database (ID: P55317). The sequence 
template homologous to FOXAI was obtained from the Protein Data 


Bank (PDB; http://www.rcsb.org/pdb) using a PSI-BLAST search (PDB 
code 1VTN®°). The VMD® program was used to embed the complexes 
of wide-type FOXA1 and mutant FOXA1 interacting with DNA. 


DNA methylation level of mutant FOXA1-binding sites 

As a surrogate for new FOXA1-binding sites in tumours with mutant 
FOXAL, we identified FOXA1 binding motifs outside of FOXA1 ChIP-seq 
peaks from ENCODE (aggregate peaks from all cell lines)®? (Extended 
Data Fig. 6d, top). Motifs within the union set of hypoDMRs in tumours 
were chosen for DNA methylation analysis, using the strongest binding 
motif per peak or per hypoDMR. DNA methylation levels per FOXA1 
motif (+ 50 bp) were calculated, and an average methylation level 
was calculated per sample. FOXA1 frameshift or truncated mutations 
include frameshift indel, in-frame indel and nonsense mutations. 

We also used two sets of FOXA1-binding sites experimentally vali- 
dated in a prostate tumour cell line in a recent publication®: binding 
sites for wild-type FOXAI and class-2 mutant FOXA1 (Extended Data 
Fig. 6d, bottom). Wild-type FOXA1-binding sites were obtained by merg- 
ing two sets of ChIP-seq peaks (GSM3508092 and GSM3508095) and 
intersecting ENCODE FOXAI peaks. Class-2 mutant FOXAI1-binding 
sites were obtained by merging three ChIP-seq peaks (GSM3508089, 
GSM3508098 and GSM3508101) and excluding wild-type binding sites. 
Peak summits were used as FOXA1-binding sites, and DNA methylation 
levels were calculated using smoothed CpG methylation levels from 
Dss®, 


PMD detection and analysis 

PMDs were identified in each sample using a hidden Markov model- 
based tool, MethPipe v.3.4.3°*. Raw CpG methylation data with meth- 
ylated and unmethylated read counts were used as input. The default 
non-overlapping bin size of 1,000 bp was used, and the bin-level was 
modelled with a two-state hidden Markov model. MethPipe further 
processed candidate PMDs by trimming ends and merging adjacent 
candidate PMDs. PMDs with ascore lower than 100 or whose genomic 
length was less than 100 kb were filtered out. PMDs from individual 
tumours were merged to make a union set of 2,218 PMDs. The average 
PMD methylation level was calculated using smoothed CpG methyla- 
tion levels from DSS*® over the union set of PMDs. Mutation frequencies 
inside and outside PMDs were calculated based on individual tumour- 
specific PMDs (Extended Data Fig. 7i). Comparison of gene expression 
levels inside and outside PMDs were based on the union set of PMDs, 
so that gene sets were identical across samples (Extended Data Fig. 7j). 
The mean expression level across tumours or normal samples was 
calculated per gene. 


DMR detection and analysis 

DMRs were identified using DSS v.2.14.0% with the raw CpG methyla- 
tion data with methylated and unmethylated read counts as input. 
Differential methylation of CpGs between each tumour and matched 
normal sample was first statistically tested without replicates using 
the following command and parameters: DMLtest(smoothing = TRUE, 
smoothing.span=500). The number of epimutations per tumour was 
calculated by counting CpGs with an absolute methylation difference 
>0.2 (Fig. 1). Then, a stringent set of DMRs per tumour were identified 
using the following command and parameters: callDMR(delta = 0.2, 
p.threshold = 107°, minlen = 200, minCG = 5, dis.merge = 50, pct. 
sig = 0.5). DMRs were divided into hypo- and hyperDMRs based on 
the direction of methylation change. To exclude the large-scale hypo- 
methylation effect of PMDs, hypoDMRs within the PMDs were excluded. 
DMRs from individual tumours were merged to make a union set of 
96,037 hypoDMRs and 17,131 hyperDMRs. Recurrent DMRs were defined 
as DMRs shared by at least 10 tumours (Extended Data Fig. 8a, b). 
Average methylation levels of DMRs were calculated using smoothed 
CpG methylation levels in the union set of hypoDMRs and hyperDMRs. 
The genomic location of recurrent DMRs was annotated using BED 
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Tools v.2.27.1% with gene annotations downloaded from GENCODE 
release 27°°. Promoters were defined as 2 kb upstream and 200 bp 
downstream of the transcription start sites. The enhancer database 
was downloaded from GeneHancer v.4.7”°. The genomic location 
of DMRs was assigned in the following order: coding exon, pro- 
moter, enhancer, 5’ UTR, 3’ UTR, intron and intergenic regions. Gene 
Ontology analysis was performed using the tool GREAT v.3.0.0%. 
Genomic coordinates of DMRs were lifted over to hg19 from hg38 for 
GREAT input. 


CIMP detection and analysis 

The CpG island methylator phenotype was identified based on recur- 
rently methylated promoter CGIs. The mean methylation level for 
18,584 CGIs located in gene promoters was calculated for each sample 
using smoothed CpG methylation levels. The CGlIs defining the CIMP 
(CIMP-CGls) were selected based on the following criteria: (1) average 
methylation level across normal samples < 0.4; (2) difference in methyl- 
ation level between tumour and matched normal samples >0.3 in more 
than half (>93) of sample pairs. A total of 289 CIMP-CGIs were identi- 
fied. Hierarchical clustering was performed across all tumours using 
the methylation levels of these CGIs. CIMP* tumours were defined asa 
cluster with most hypermethylated CIMP-CGls, similar to the method 
previously described®**”, A Fisher’s exact test was performed to identify 
associations of CIMP* tumours with genetic alterations. Differencesin 
the BCR of subjects with CIMP* and CIMP” tumours were assessed using 
Kaplan-Meier analysis followed by a log-rank test. 


Epimutation burden and analyses 

The epimutation burden of a tumour was calculated as the number 
of CpGs with a methylation difference >0.2 between the tumour and 
its matched normal sample divided by the total number of CpGs in 
the genome. Correlation between the epimutation burden and other 
mutational or clinicopathological features were tested using Spear- 
man’s rank order correlation. 


miRNA clustering 

We selected the 98 most variant miRNA (17%) from 105 tumours for 
clustering. Only miRNA expressed in more than 10% of samples were 
used. We transformed each row of the matrix by log,,(TPM + 1) and 
median-centred the matrix, then used the pheatmap package to scale 
the rows. We used ward.D2 and Euclidean distance measures for clus- 
tering of the columns and rows, respectively. 


iCluster 

Integrative clustering of three genomic data types for all available sam- 
ples (127 patients) was performed using the R package iClusterplus”, 
with the following inputs: (1) somatic CNAs defined as the merged 
copy-number segments identified by Control-FREEC; (2) the 1,600 most 
variable genes detected by RNA-seq; and (3) the 4,890 most variable 
DMRs defined using WGBS. 

We ran iClusterPlus.tune with different numbers of possible clusters 
(n=3-5), choosing the number of clusters at which the percentage 
of explained variation levelled off (n = 4) and the clustering with the 
lowest Bayesian information criterion. The number of clusters (k) was 
estimated. We computed a deviance ratio metric, in which kwas chosen 
to maximize the deviance ratio. We chose k=3, which is the elbow point 
to construct the model. We also performed unsupervised clustering 
oneach of the three data types individually. Gene expression data were 
clustered by hierarchical clustering with the Ward.D2 method and 
Pearson correlation as the distance metric. Somatic CNA data were 
clustered by hierarchical clustering with the Ward.D2 method and 
Manhattan distance. Methylation data were clustered by hierarchical 
clustering with the Ward.D2 method and Manhattan distance. 

Association analysis of clinical features and molecular changes 
with different iCluster subtypes was performed using Kruskal-Wallis, 


Wilcoxon rank-sum, or Fisher’s exact tests. Differences in somatic CNAs, 
fusions, somatic mutations and DNA methylation across different iClus- 
ter subtypes was tested using ANOVA. Gene set enrichment analysis” 
was used to detect the gene sets more highly expressed in one subtype 
than the other three subtypes. Difference in the BCR of the subjects 
across the iCluster subtypes was assessed using Kaplan-Meier analysis 
followed by a log-rank test. 


Pathway comparison and visualization 

Genetic lesions in 12 selected oncogenic pathways were compared 
between the CPGEA, TCGA and SU2C cohorts, which represent Chinese 
primary, Western primary, and Western metastatic prostate cancer, 
respectively. Genetic alterations included coding mutations and CNAs. 
We used information about oncogenic and clinically actionable muta- 
tions from the OncoKB database and CGC to determine whether the 
predicted effect would manifest in the tumour based on the observed 
genomic alteration. Somatic alterations that were labelled oncogenic, 
TSG, or oncogene/TSG either in OncoKB or CGC were used. For CNAs, 
we determined whether the annotated genes were functionally ampli- 
fied or deleted in each sample, and only amplifications and deletions 
were used for oncogenes and tumour-suppressor genes, respectively. 
Genes with discordant annotations in the two databases were used 
only if their expression levels were significantly different in tumours 
(P< 0.05). If the gene expression was upregulated, we annotated the 
gene as an oncogene, and if downregulated, as a TSG (Extended Data 
Fig. 10b, c, http://www.cpgea.com). 

For noncoding mutations, the mutations selected from the hotspot 
analysis (Extended Data Fig. 5c) were assigned to oncogenic pathway 
genes. For epigenetic alterations, we assigned local recurrent DMRs 
to their nearest gene. For structural variation deletions and amplifi- 
cations, we assigned all genes within the alteration to the event. For 
inversions, inter-chromosomal structural variations, and tandem 
duplications, we calculated the genes hit by breakpoints (Extended 
Data Fig. 10e). 


Supporting website 

The supporting website (http://www.cpgea.com) contains the fol- 
lowing: (1) an analysis workflow page that contains all bioinformatics 
pipelines for data processing and genetic and epigenetic mutation 
detection with detailed parameters; (2) a sequencing information 
page with WGS, WGBS, RNA-seq and miRNA-seq data by sample; (3) 
a validation page with all validation results, including FOXA1 muta- 
tions, fusion events, and structural variation events; and (4) a pathway 
page that contains the alteration frequencies of important oncogenic 
pathway genes by cohort (CPGEA, TCGA and SU2C) and by alteration 
type (fusion, structural variation, noncoding and DNA methylation), 
including alterations in individual patients. Detailed information such 
as percentage by alteration type, the location of coding mutations as 
alollipop diagram, and a link to the epigenome browser can be found 
by clicking gene names or frequencies. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


All data, including raw data, mutation calls, and clinical information, 
have been deposited to the Genome Sequence Archive for Human 
(http://bigd.big.ac.cn/gsa-human/) at the BIG Data Center, Beijing 
Institute of Genomics, Chinese Academy of Sciences, under the acces- 
sion number PRJCA001124. The raw sequencing data and somatic and 
germ-line mutation calls contain information unique to an individual 
and require controlled access. The deposited and publicly available 
data are compliant with the regulations of the Ministry of Science and 


Technology of the People’s Republic of China. Source Data for Figs. 2, 
4 and Extended Data Figs. 6-8 are provided with the paper. 


Code availability 


All computational code used in this study is available at the supporting 
website (http://www.cpgea.com). 
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Extended Data Fig. 1| Clinical samples, data generation and somatic 
mutation landscape of CPGEA. a, Clinical and pathological patient 
characterization. b, Study design, indicating the number of tumours with each 
data type. The cohort consisted of 208 patients who underwent radical 
prostatectomy. All tumours were analysed by WGS, as was a matched normal 
para-tumour specimen from each patient. In addition, RNA-seq (n=134 
tumours), miRNA-seq (n=105), and whole-genome DNA methylation (n =187) 
data were generated for a subset of patients. c-f, Comparison between somatic 
alteration calls from two pipelines for the TCGA PRAD (primary prostate 
tumour) cohort. ‘CPGEA pipeline’ indicates the pipeline used in this study. 
‘TCGA report’ indicates publicly available somatic alteration calls. 

c, Distribution of mutation burdens in each cohort. Each dot corresponds toa 
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mutation burden calculated froma tumour-normal pair. Red horizontal bars 
indicate the median mutation burden from the CPGEA pipeline and TCGA (both 
0.70 per Mb). d, Genomic regions with significantly recurrent somatic CNAs 
called by GISTIC2.0. e, Heat map showing genome-wide CNAs. Top, 114 
tumours clustered using the WGS-based CPGEA pipeline. Bottom, array-based 
TCGA results for the same tumours, arranged in the same order. f, Gene-level 
alteration frequencies from the two pipelines for the TCGA cohort. 

g, Alexandrov signatures in CPGEA and their association with clinical features. 
Top, percentage of samples per signature. Bottom, mutation counts for each 
signature, ordered from low to high by individual patient. h, Box plot showing 
the correlation of signatures 8 and 16 with Gleason score (Kruskal-Wallis test). 
Box plots as in Fig. 4b. Each dot corresponds to atumour sample. 
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Extended Data Fig. 3 | Landscape of structural variations in CPGEA. a, Types 
of structural variation and numbers for individual tumours (columns). 
Chromoplexy and chromothripsis status, CHD] deletion status, and ERG fusion 
status are displayed as a heat map. b, Frequency of recurrent structural 
variations and their affected genes for five types of structural variation. 

c, Arecurrent inversion potentially disrupts a TAD boundary and results in 
enhancer hijacking. HiC map for the LNCaP cell line over the inversion. The 
inversion and TAD boundaries are marked. Expression levels of potentially 


affected genes are displayed as box plots. Pvalues were determined by two- 
sided Mann-Whitney U-test. Box plots are asin Fig. 4b. Each dot corresponds to 
anormal sample (n = 134), atumour with no structural variation (wild-type 
(WT), n=131), or atumour with structural variation (n =3). d, Definition of five 
tiers of structural variation patterns based genomic annotation of the 5’ and 3’ 
breakpoints. e, Genomic location distribution of 5’ (left) and 3’ (middle) 
breakpoints, and distribution of different types of structural variation across 
the five defined tiers (right). 
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Extended Data Fig. 4| See next page for caption. 
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Extended Data Fig. 4 | Landscape of gene fusions and SMGs in CPGEA. a, The 
circle represents gene fusions in Chinese and Western cohorts. Recurrent 
fusions (more than two samples) are displayed as connected gene pairs, in 
which the width of the connecting arc represents the number of samples that 
contained the fusion. Red indicates novel gene fusions not present in public 
databases (FusionHub). b, Fusion was validated by Sanger sequencing and 
RNA-seq data. Red cells indicate validated fusion events, and green cells 
indicate PCR failure. c, Circos plot displaying ETS family fusions. Expression 
levels are shown asa function of copy number. d, The SCHLAP1-UBE2E3 gene 


fusion.e, AMACR fusions. f, A heterozygous SNDI-BRAF fusion found in CPGEA. 


g, Intotal, 83 SMGs were detected by MuSiC, including 7 genes called by both 
MuSiC and MutSigCV. h, Fraction of primary, metastatic, and other cancer 
types investigated by each study. i, Venn diagrams of SMGs defined in different 
studies. j, Genes significantly mutated in CPGEA, Western primary, and 
Western metastatic cohorts. Purple cells indicate that the gene was defined as 
an SMG inthe study. h-j, The Western cohorts are from CPCG’, SU2C", T/C/B 
(Trento/Cornell/Broad, neuroendocrine prostate cancer)*®, B/C (Broad/ 
Cornell)’, CRC®, M/DFCP, TCGA2, Michigan", MSKCC®, Organoid’®, CNA- 
PNAS” and MSK”. 
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Extended Data Fig. 5 | Noncoding mutations in CPGEA. a, Schematic 
workflow of noncoding mutation analysis in CPGEA. b, Distribution of 
noncoding mutations across different genomic features. c, Significance of 
mutation hotspots in noncoding regulatory regions. Each hotspotis colour- 
coded for its regulatory region annotation, and the statistical significance 
(false discovery rate (FDR)) and number of hits per sample are displayed. 

d, Significance of recurrent mutations in regulatory regions of interest. 
Regulatory regions for individual genes are displayed based on local and global 
measures of statistical significance (FDR). Colours indicate regulatory region 
annotations, and key genes are labelled. e, Enrichment of noncoding mutations 
resulting in gain or loss of transcription factor-binding sites. For each 


Cnnee Charters Hill EE 


transcription factor, the match score to the position weight matrix (PWM) was 
determined for mutations that could potentially destroy or create a binding 
site for that transcription factor. Plotted for each transcription factor is the 
mean difference inthe match scores for the mutated and referencealleles. Red 
indicates FDR< 0.05. Pvalues for differences in mean match score were 
computed by two-sided paired Wilcoxon rank-sum test. f-h, Examples of 
noncoding mutations in selected genes. 7BL1XRI1 (f), FOXA1 (g) and FL/1 (h) are 
shown. Genome browser views show the location of the noncoding mutation. 
The genomic coordinates and types of noncoding mutation are labelled above 
the genome browser. Gene expression of genes with noncoding mutations is 
depicted. 
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Extended Data Fig. 6| See next page for caption. 


Extended Data Fig. 6 |FOXA1 mutations in CPGEA. a, FOXA1 mutation 
validation. Two representative validations by Sanger sequencing and 
reconstructed RNA-seq analysis. b, Validation of aFOXA1 in-frame deletion- 
derived peptide by mass spectrometry. c, Mapping of FOXAI mutations onto 
the three-dimensional structure of FOXA1and bound DNA (based on PDB 
registry 1VTN’).d, DNA methylation over FOXA1-binding sites in tumours with 
FOXAI1truncation/in-frame deletion. Top, FOXA1-binding motifs in the 
ENCODE chromatin immunoprecipitation with high-throughput sequencing 
(ChIP-seq) dataset (left) versus FOXA1-binding motifs outside of FOXA1 ChIP- 
seq peaks (right). Bottom, wild-type FOXA1-binding sites (left) and mutant 
FOXAI-binding sites (right) from recently published ChIP-seq data**®. Pvalues 
were determined by one-sided Mann-Whitney U-tests. Box plots are asin 


Fig. 4b. Each dot corresponds to anormal or tumour sample. e, Clonal analysis 
of FOXA1in CPGEA. f, Mutual exclusivity or co-occurrence of gene alterations 
between genes belonging to12 important curated pathways. Only alterations 
with at least one significant interaction (P< 0.05) are included. Asterisks 
indicate significant relationships. g, Allele frequency distribution of FOXAI 
mutations in CPGEA and TCGA processed with the CPGEA pipeline. 

h, Significant mutual exclusions and co-occurrences between FOXA1 
mutations and other genetic lesions in CPGEA, identified by OncoPrint from 
cBioPortal®. i, FOXA1 mutations and downstream pathways. Pairwise 
comparison of expression levels of important pathways. The z-score of specific 
genes and clinical features are displayed ina heat map grouped by different 
mutation subtypes. 
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Extended Data Fig. 7| See next page for caption. 


Extended Data Fig. 7| DNA methylation abnormalities in CPGEA. a, Heat map 
of DNA methylation levels in the CPGEA cohort. Rows represent defined 
genomic regions including PMDs, hypoDMRs and hyperDMRs, and columns 
represent samples. Tumours (right) and matched normal samples (left) are 
sorted by epimutation rate. In each category, genomic regions are sorted by 
chromosomal coordinates. The top panel shows clinicopathological features 
of patients (as in Fig. 1), genetic alterations including fusions and coding 
mutations, and other molecular phenotypes. b, Two-dimensional density plot 
of the average CpG methylation level innormal versus tumour samples from 
the same patient. c, Average methylation level of CpGs overlapping different 
genomic features. Pvalues determined by two-sided Wilcoxon signed-rank 
test. CDS, coding sequence. Each dot corresponds toa normal prostate or 
tumour sample. d, Average methylation level of CpGs overlapping different 
repeat element classes. Pvalues were determined by two-sided Wilcoxon 
signed-rank test. Each dot corresponds toa normal prostate or tumour sample. 
e, Average non-CG methylation level intumours and matched normal samples. 
Each dot represents a sample. Mean 0.37% for each group. P values were 


determined by two-sided Wilcoxon signed-rank test. Each dot correspondstoa 
normal prostate or tumour sample. f, Genome-wide methylation levels in 100- 
kb bins, clustered across tumour samples. Rows represent samples, and 
columns represent 100-kb genomic bins, with the DNA methylation level of 
each bin represented by the heat map. g, The genome fraction of total PMD 
lengthin each tumour, in decreasing order. The leftmost bar represents the 
genome fraction of the union set of PMDs across all tumours. h, PMD 
recurrence. The red line represents PMDs shared by at least 100 tumours (711 
out of 2,218). i, Mutation frequency inside versus outside PMDs. P=7.5 x 10°, 
two-sided Wilcoxon signed-rank test. Mutation frequency was measured as the 
average number of SNVs per Mb. Each dot corresponds toatumour sample 
(n=187).j, Expression level of genes located in PMDs (n= 4,043) or outside 
PMDs (n=15,344) in tumours versus matched normal samples. Pvalues 
determined by one-sided Wilcoxon signed-rank test. Genes in PMDs had 
significantly lower expression than genes outside PMDs in both tumours and 
normal samples (P= 0, two-sided Mann-Whitney U-test). Outlier genes with 
very high expression were omitted from the plot. All box plots are as in Fig. 4b. 
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Extended Data Fig. 8| See next page for caption. 


Extended Data Fig. 8 |DMRs and CIMPin CPGEA.a, Recurrence of hypoDMRs. 


There were1,172 hypoDMRs were shared by at least 10 tumours (red line). b, 
Recurrence of hyperDMRs. There were 4,214 hyperDMRs were shared by at 
least 10 tumours (red line). c, Genomic location of the union set of hypoDMRs 
and recurrent hypoDMRs. The innermost circle represents the reference 
genome background. d, Genomic location of the union set of hyperDMRs and 
recurrent hyperDMRs. The innermost circle represents the reference genome 
background. e, MSigDB perturbation enrichment analysis of recurrent 
hypoDMRs (n=1,172) using GREAT®”. f, Gene Ontology (GO) enrichment 
analysis of recurrent hyperDMRs (n=4,214) using GREAT. The top 20 GO 
biological process terms are shown. g, Scatter plots of example epigenetically 
silenced genes. Each dot represents anormal sample (red), atumour without a 
silenced gene (blue), or atumour witha silenced gene (black). TPM, transcripts 
per million. h, Heat map of CIMP-CGI methylation levels. Rows represent CIMP- 


CGls, and columns represent samples. Tumours (right) were clustered by CIMP- 
CGI methylation levels, and matched normal samples (left) were sorted inthe 
same order. CIMP-CGIs were sorted by chromosome and genomic coordinates. 
The top panel shows clinicopathological features of patients (as in Fig. 1), 
genetic alterations, including fusions and coding mutations, and other 
molecular phenotypes. i, Proportion of recurrent hyperDMRs overlapping 
CGls.j, Association of CIMP* tumours (n =33) with gene mutation status. Red 
vertical line represents P= 0.05 (two-sided Fisher’s exact test). k, Kaplan-Meier 
plot of biochemical recurrence-free survival in patients with CIMP* and CIMP™ 
tumours. Pvalues were determined by two-sided log-rank test. 1, m, Correlation 
between epimutation burden and mutation (I) or CNA (m) burden. Spearman’s 
correlation coefficient p = 0.37, P=2.5 x 10” for mutation burden, and p= 0.65, 
P=1,2x10™ for CNA burden. Each dot represents a tumour (n=187). 
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Extended Data Fig. 9| See next page for caption. 
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Extended Data Fig. 9 | Molecular subtypes of prostate cancer. a, Molecular 
taxonomy across eight cohorts based on sevenimportant oncogenic drivers 
identified by TCGA. b, Mutation burden, CNA burden and epimutation burden 
across the four molecular subtypes in CPGEA.c, Key CNA events, CIMP and 
fusion events across the four subtypes. ERG fusion-positive genes were 
combined results from Meerkat, SOAPfuse and high expression samples. 

d, Annotation of each molecular subtype. e, Kaplan-Meier plot of biochemical 
relapse-free survival for iCluster subtype D compared to the other three 
iCluster subtypes. Pvalues were determined by two-sided log-rank test. 

f-h, Clustering of tumours using single datasets, using RNA-seq analysis (f), 
DNA methylation (g), and miRNA data (h). hh, Rows represent miRNAs and 
columns represent tumours. The top panel shows clinical features of patients 
(as in Fig. 1) along with four miRNA clusters and four iCluster subtypes. i, Violin 


plots of mutation, CNA and epimutation burdens for four miRNA clusters. 
Mutation burden, P= 0.85, 0.43, 0.61, 0.58, 0.24 and 0.16, for the comparison 
between miRNA clusters of 1-2, 1-3, 1-4, 2-3, 2-4 and 3-4, respectively. CNA 
burden, P=5.9 x 10°, 0.00025, 0.29, 0.045, 1.3x10°, and 4.1x 10°, inthesame 
order. Epimutation burden, P= 0.0052, 0.090, 0.24, 0.20, 6.1x 10> and 0.0080, 
inthe same order. Pvalues determined by two-sided Mann-Whitney U-test. 
Each dot corresponds to atumour sample belong to miRNA cluster 1(n=21),2 
(n=37),3(n=34), or 4 (n=13).j, Box plots of miRNA expression levels innormal 
samples and four miRNA-based tumour clusters (cluster 1 (n= 21), 2 (n=37),3 
(n=34), or 4 (n=13)). Box plots areas in Fig. 4b. k, Kaplan-Meier plot of 
biochemical recurrence-free survival in patients with tumours belonging to 
miRNA cluster 2 or other clusters. Pvalues were determined by two-sided log- 
rank test. Primary tumours without any treatment were included. 
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Extended Data Fig. 10 | Oncogenic pathways in prostate cancer. a, Summary 
of genetic and epigenetic lesions in12 curated pathways across the Chinese 
prostate cancer subtypes. b, Comparison of the frequency of disturbances in 
the AR pathway between CPGEA (primary), TCGA (primary) and SU2C 
(metastasis) cohorts. The frequency of coding mutations in each AR pathway 
geneis shown.c, The frequency of fusions, structural variations, noncoding 
mutations and epimutations in each AR pathway gene in the CPGEA cohort. 
Information on additional pathways is provided at http://www.cpgea.com. 


d, Comparison of pathway-level alterations across the CPGEA (206 samples, 
excluding 2 microsatellite instability (MSI) samples), TCGA (114 samples 
processed with the CPGEA pipeline), and SU2C cohorts (150 samples 
downloaded from cBioPortal). To compare across cohorts, only coding 
mutations and CNAs were considered. e, Frequency of coding alterations 
(CNAs, fusion genes and nonsynonymous coding mutations) noncoding 
alterations, and both for each pathway in the CPGEA cohort. f, Different levels 
of actionable mutations predicted by OncoKB in CPGEA and TCGA. 
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follows: Gleason score 6:17 (8%); 7: 96 (46%); 8-10: 94 (45%). Organ-confined carcinoma (pT2) was found in 105 patients; 93 
showed extra-prostatic tumor extension (pT3); 10 patients had advanced disease which invaded bladder, rectum or pelvic 
muscles (pT4). 15 patients (7%) had bone metastasis at the discretion of the referring urologist, prior to surgery. 11 percent of 
patients who underwent the lymph nodes dissection had positive lymph nodes invasion. Our cohort also contained 18 patients 
who exhibited some level of metastasis at diagnosis, and they were hormone sensitive. 16 patients were treated with ADT 
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were summarized in Supplementary Data 2. 
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® Check for updates 


How the brain processes information accurately despite stochastic neural activity is a 
longstanding question’. For instance, perception is fundamentally limited by the 


information that the brain can extract from the noisy dynamics of sensory neurons. 
Seminal experiments?’ suggest that correlated noise in sensory cortical neural 
ensembles is what limits their coding accuracy* ©, although how correlated noise 
affects neural codes remains debated’ ™. Recent theoretical work proposes that howa 
neural ensemble’s sensory tuning properties relate statistically to its correlated noise 
patterns is a greater determinant of coding accuracy than is absolute noise 
strength” “, However, without simultaneous recordings from thousands of cortical 
neurons with shared sensory inputs, it is unknown whether correlated noise limits 
coding fidelity. Here we present a 16-beam, two-photon microscope to monitor 
activity across the mouse primary visual cortex, along with analyses to quantify the 
information conveyed by large neural ensembles. We found that, in the visual cortex, 
correlated noise constrained signalling for ensembles with 800-1,300 neurons. 
Several noise components of the ensemble dynamics grew proportionally to the 
ensemble size and the encoded visual signals, revealing the predicted information- 
limiting correlations” “. Notably, visual signals were perpendicular to the largest 
noise mode, which therefore did not limit coding fidelity. The information-limiting 
noise modes were approximately ten times smaller and concordant with mouse visual 
acuity”. Therefore, cortical design principles appear to enhance coding accuracy by 
restricting around 90% of noise fluctuations to modes that do not limit signalling 
fidelity, whereas much weaker correlated noise modes inherently bound sensory 


discrimination. 


The sensitivity and noise fluctuations of primary sensory neurons, such 
as photoreceptors or mechanoreceptors, limit the perception of weak 
stimuli* 78, although disagreement persists about which downstream 
noise sources limit perceptual discriminations when sensory inputs 
exceed detection thresholds*“. A groundbreaking experiment spurred 
this debate by identifying individual visual cortical neurons that signal 
visual attributes nearly as reliably as an animal’s perceptual reports”’. 
One proposed explanation is that similarly tuned cortical neurons 
might share positively correlated noise fluctuations that limit the per- 
ceptual improvements attainable by averaging signals from multiple 
cells with similar response properties** (Extended Data Fig. lac). 
Theoretical studies show that positively correlated noise limits 
the information that cells with similar sensory-evoked responses can 
encode*>”, but this is not necessarily the case for ensembles of cells 
with diverse tuning properties® ”° (Extended Data Fig. 1d-f). A recent 
framework based ona feedforward neural network asserts that, in the 
space of all possible neural ensemble dynamics, it is only noise in the 


dimensions of sensory representations that constrains coding fidel- 
ity’ (Extended Data Fig. lg-m). Previous experiments have examined 
noise in cell pairs, but this approach incurs substantial measurement 
errors??? and the results were conflicting**”! ~*. To our knowledge, no 
previous study has recorded neural ensemble noise patterns, related 
these to sensory signals, and tested the idea that only specific noise pat- 
terns confine the information encoded by large neural populations”. 


Amulti-beam two-photon microscope 

To make such measurements, we built a laser-scanning two-photon 
microscope witha 4-mm/ field of view for imaging across the span of the 
mouse primary visual cortex (V1). The microscope has 16 photodetec- 
tors and 16 corresponding beams, which originate from one laser and 
are focused 500 um apart in the specimen in a4 4 array (Fig. 1). Four 
beams are active at any instant, and switching toa different four beams 
takes about 50 ns; this enables scanning of a larger area per unit time 
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Fig. 1| Two-photon Ca” imaging over a4-mm7 field of view. a, Schematic of 
the microscope. Sixteen laser beams converge on a pair of galvanometer 
mirrors (X- and Y-galvos). Sixteen photomultiplier tubes (PMTs) detect 
fluorescence. b, Two-photon image (greyscale, mean of 1,000 frames taken at 
7.23 Hz) of GCaMP6f-expressing layer 2/3 pyramidal neurons in the visual 


than would be feasible with one beam and the same optics (Extended 
Data Figs. 2-4). Compared to 16 active beams, our approach yields 
fourfold greater fluorescence for any given time-averaged illumina- 
tion power and delivers fourfold less heat to the brain for an equiva- 
lent rate of fluorescence emission (Supplementary Note). The active 
laser foci are >1 mm apart, so fluorescence scattering between the four 
active image tiles is <2%; scattering into inactive tiles can be corrected 
computationally using the 16 photocurrents (Extended Data Fig. 4). 
Our system images neocortical activity down to layer 5 with full-frame 
acquisition rates of 7.23-17.5 Hz (Supplementary Videos 1-3), whereas 
other two-photon microscopes with large fields of view attain similar 
imaging rates over smaller sub-fields** ”’ (Extended Data Fig. 2j, k). 


Imaging studies across cortical area V1 


We studied layer 2/3 pyramidal neurons, which project extensive 
connections from V1 to higher visual areas. In awake mice express- 
ing the Ca?*-indicator GCaMPé6f in these neurons, we imaged around 
1,000-2,000 cells concurrently as mice viewed, with one eye, arandom 
sequence of moving gratings. Each grating was oriented at either +30° 
or -30° from vertical, lasted 2s and spanned the central ~50 deg of the 
eye’s visual field (Fig. 2a—c). There were 350 trials with each stimulus, 
but because locomotion modulates vision?’ we analysed only trials 
with locomotor speeds of less than 0.2 mms ‘(217-332 trials per stimu- 
lus). From these recordings we extracted 8,029 neurons, mainly in V1 
(1,031-2,191 cells in each of 5 mice; Extended Data Figs. 5, 6). 

A total of 5,008 cells responded at least weakly to the stimuli, with 
activity rates and stimulus preferences consistent with those found in 
previous studies”*”’ (Extended Data Fig. 6a-d). These neurons likely had 
substantially overlapping inputs, because mouse V1 neurons respond to 
large portions of the visual field that are comparable in size to our stim- 
uli?’. Noise correlation coefficients in pairs of concurrently recorded 
cells were widely distributed, with positive mean values (r=0.06+ 0.01; 
mean +s.d.;5 mice) as in most previous reports’ (Fig. 2d-g, Extended 
Data Fig. 6e-i). Active cell pairs that on average responded similarly to 
the two stimuli had, on average, noise correlation coefficients about 
twice as large as those that responded dissimilarly (Fig. 2f, g). 

To evaluate the significance of these correlations, we created trial- 
shuffled datasets in which the responses of each cell were permuted 
across different trials, thereby mimicking cells with statistically 


cortex of an awake mouse. Overlaid are >2,000 neuronal somata (green) 
identified in the Ca** video. Boxed areas are magnified inc. c, Magnifications of 
the boxed areas in b. d, Example Ca”*-activity sources (computationally 
identified), revealing dendrites. Images in b-d are representative of results 
from 10 mice. Scale bars: b, 250 um; c,d,100 pm. 


identical individual responses as in the real data but with uncorrelated 
noise fluctuations. Non-zero noise correlations in trial-shuffled datasets 
merely reflect the finite number of trials. Indeed, noise correlation 
coefficients were more narrowly distributed than in real data, although 
many deviated substantially from zero (Fig. 2d, g). This confirms the 
difficulty of measuring noise correlations given limited trials’ and 
likely explains why previous studies of cell pairs yielded divergent 
results*67!, 


Evaluations of cortical coding fidelity 


To study visual coding, we represented the dynamics using a population 
vector (one cell per dimension) and used the discriminability index, 
d’, to assess the statistical confidence in distinguishing the stimulion 
the basis of their evoked neural responses””. (d’)’ relates to the Fisher 
information that the cell ensembles convey about stimulus identity*?”°, 
which even for binary classifications (<1 bit of Shannon entropy) can 
be infinite—that is, 100% confidence*. Theories of noise correlations 
and neural coding have largely examined pairwise discriminations, as 
error rates discriminating more than two stimuli are well approximated 
using d’ values from all the pairwise comparisons”. 

To enable us to determine d’ accurately despite having about 5- to 
10-fold fewer trials than cells recorded per mouse, we created analyses 
to extract the primary, ensemble noise modes without measuring noise 
in cell pairs (Appendix). First, we performed a dimensional reduction 
by using partial least squares (PLS) analysis to identify and retain only 
five population vector dimensions in which the stimuli were highly 
distinguishable; retaining more than five dimensions only added noise 
and decreased the ability to distinguish the stimuli (Fig. 3a, b, Extended 
Data Figs. 5b, 7a—c). In this five-dimensional representation, the neural 
dynamics evoked by the two stimuli became distinguishable over the 
first -0.5 s of stimulus presentation (Fig. 3b—d). Using an optimal linear 
decoder of the ensemble activity, d’ values rose to a plateau within 
~0.5s of the stimulus onset; the optimal decoder then remained stable 
until stimulus offset (Extended Data Fig. 7d). In shuffled datasets the 
stimuli were even more distinguishable, as d’ values attained greater 
values than in real datasets, indicating that correlated noise degrades 
stimulus representations in the real data. 

We also evaluated decoders that ignore noise correlations. ‘Diagonal 
decoders’, which neglect off-diagonal elements of the noise covariance 
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Fig. 2| Noise correlations of cell pairs are difficult to estimate from 
hundreds ofstimulus trials. a, Image of visual cortex, processed as in Fig. 1b. The 
coordinate system indicates anterior (A) and lateral (L) directions. The redline 
marks the area V1 boundary, found by retinotopic mapping. Scale bar, 500 um. 

b, Top, ineach trial, one of two randomly chosen stimuli (A or B) appeared for 2 
s, followed by a uniform background for 2s. Bottom, each stimulus was a 
drifting grating, oriented at either +30° or -30° from vertical. The analyses in 
d-g used 217-332 trials per stimulus in each of 5 mice. c, Example Ca” activity 
traces. F, fluorescence intensity. d, Histograms of noise correlation 
coefficients (Pearson’s r) for concurrently imaged cell pairs (6,946,280 cell 
pairs; 5 mice), computed using the estimated spike count of each cell within 
[0.5s, 2s] of stimulus onset. rvalues are averages across both stimuli for real 
and trial-shuffled datasets. The latter histogram was Gaussian 

(R?= 0.9982 + 0.0005 (95% confidence interval)) with a variance around 50% of 
that of the real data, showing the difficulty of accurately determining pairwise 
noise correlations with hundreds of trials. Error bars estimated as counting 
errors are too small to see. e, Histograms of noise correlation coefficients 
differed significantly for cell pairs with similarly or differently tuned mean 
responses to the two stimuli, computed for the top 10% most active cells and by 
grouping cell pairs into those with positively or negatively correlated mean 
responses to the two stimuli. (***P<1.3 x 10 ° for all 5 mice; two-tailed 
Kolmogorov-Smirnov test; 901 cells, 43,887 positively and 43,768 negatively 
correlated pairs). For exact Pvalues for this and all subsequent figures, 

see Supplementary Information. f, g, Box plots of mean (f) and full width at half 
maximum (FWHM) (g) values of the colour-corresponding distributions ind, e. 
Circles indicate data points for 5 individual mice. Noise correlations inf were 
greater for cell pairs with similarly tuned responses (one-tailed Wilcoxon rank 
sum test, ***P< 0.001 for all 5 mice). Extended Data Fig. 6g-i shows results for all 
cell pairs. Boxes cover the middle 50% of values, horizontal lines denote 
medians, and whiskers span the full range of the data. 


matrix®’, performed nearly as well as optimal linear decoders, although 
the decrement was statistically significant (Fig. 3d—h). Thus, although 
correlated neural noise degraded stimulus encoding, using the noise 
structure to improve decoding brought only modest benefit. 

The stability of the optimal decoder across most of the stimulus 
duration suggested that, by integrating neural activity across the 
stimulus presentation, the brain might in principle average out noise 
in its sensory representations to improve discrimination. To test this, 
we examined the optimal linear decoder of the time-integrated neu- 
ral responses over each trial, which indeed yielded greater d’ values 
(Extended Data Fig. 7e). For comparison, we examined decoders of 
the cumulative set of neural responses that had occurred up to each 
moment inthe stimulation trial (Fig. 3e—h). Cumulative decoders sur- 
passed those using individual time-bins of neural activity, but not the 
simple decoder of time-integrated activity (Extended Data Fig. 7e). 
This suggests that there was little temporal structure in the sustained 
neural responses that might improve decoding beyond that attained 
using time-integrated activity, at least as reported by Ca”* imaging. 

We next examined how decoding varied with n, the number of cells 
analysed. In the absence of correlated noise, each additional cell used 
should linearly increase the Fisher information that is conveyed about the 
identity of the stimulus*™. Trial-shuffled datasets confirmed this, as (d’)’ 
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increased linearly with n (Fig. 3f, g). In real data, (d’)* reached a plateau 
when n exceeded -1,000 cells, for both instantaneous and cumulative 
decoders (Fig. 3f-i). This constitutes direct evidence of information satu- 
ration inlarge neural populations, without extrapolations from cell pairs. 

Several control analyses bolstered these conclusions. First, we vali- 
dated linear decoding as a way of assessing Fisher information. The 
noise covariance matrix was stimulus-independent, with similar matrix 
elements for both stimuli (r=0.81+ 0.16; mean + s.d.; 20 off-diagonal 
matrix elements for each of 5 mice). Thus, nonlinear decoders should 
have similar accuracy as the optimal linear decoder, which we con- 
firmed by quantifying the additional information that an optimal quad- 
ratic decoder could extract from the data (Extended Data Fig. 7f-h). 
Second, we verified that there were a sufficient number of trials to 
estimate d’ accurately. In every mouse the empirically determined 
values of a’ approached a stable estimate with increasing numbers of 
trials and were stationary across the imaging session (Extended Data 
Fig. 7g, i,j). Third, we confirmed that alternative decoding methods 
using regularized regression yielded similar d’ values and identical 
conclusions to those from PLS analysis (Extended Data Fig. 8a, b). 
Further, we used regularized regression to analyse publicly available 
neural activity datasets”, which also showed that d’ reached a plateau 
(Appendix). Fourth, we used simulations to verify that our decoders 
were robust to potential large sources of neural variability, such as com- 
mon mode noise and gain modulation of visual responses (Extended 
Data Fig. 8c—h). Fifth, we mathematically derived the accuracy of d’ 
determinations made via PLS analysis (Appendix). Altogether, numer- 
ous analyses and derivations upheld the information saturation that 
we found in ensembles of -1,000 neurons or more. 

The data also enabled us to test a framework for understanding 
cortical noise fluctuations based ona feedforward network”. In this 
framework, the encoded information, /, as a function of the ensem- 
ble size, n, obeys [(n) = (/,n)/[1 + en], where the constant /, is the mean 
encoded information per cell in the shuffled data and the parameter 
€ characterizes the strength of information-limiting correlations”. 
Our data matched this prediction (Fig. 3f, g), verifying the existence 
of information-limiting correlations and establishing the effect size. 
The minimum set of cells needed to detect information saturation is 
approximately 2¢7, which is around 800-1500 cells for the instantane- 
ous decoders (Fig. 3h, i). This shows the importance of large recordings 
to adjudicate whether correlated noise limits coding accuracy, and 
likely explains why previous recordings of less than 350 cells did not 
observe information saturation”. 


Comparing neural coding to visual acuity 


An additional benefit of recordings across V1is to enable estimates of 
the attainable perceptual acuity given only the information encoded 
in the early visual cortex, which is important for fine discriminations 
of grating stimuli. To approximate conditions more representative 
of the perceptual threshold, we examined another 5 mice that viewed 
the same grating stimuli as before but with +6° orientations—closer to 
the discriminability limits. 

As expected, these stimuli were harder to distinguish from their 
evoked neural activity stimuli (Extended Data Fig. 9). The asymptotic d@’ 
value (~2.5) for largen suggests that gratings presented at +2.4° under 
otherwise identical viewing conditions would have the minimal, percep- 
tibly distinct orientations (d’ ~1). Behavioural studies of mouse visual 
spatial acuity under photopic illumination” yield similar predictions 
of +2.3° (Methods). Direct measurements of mouse visual orientation 
sensitivity have been slender and used different stimuli from ours, 
but yielded similar values**. The fine agreement in these numbers is 
probably fortuitous, but the similar values estimated from cortical 
responses and behavioural studies*** suggest that the information 
signalling limits of visual cortical coding likely have an important role 
in setting perceptual bounds. 
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Fig. 3 | Correlated noise limits the information conveyed by cortical neural 
ensembles. a, Schematic of neural ensemble dynamics ina population vector 
representation of reduced dimensionality. Trajectories, r,(¢) andr,(0), depict 
single-trial responses to different (red, blue) stimuli. Ata fixed time after 
stimulus onset, the sets of responses to the two stimuli form two distributions 
of points (ellipses). At the bottom left are projections of these distributions 
ontoasubspace, found by PLS analysis, in which responses to the two stimuli 
are most distinct. The green line indicates the optimal linear boundary for 
classifying stimuli in this subspace. The stimulus discriminability, d’, equals the 
separation, Au, of the two distributions along the dimension orthogonal to this 
boundary, divided by thes.d., 0, of each distribution along this dimension. 

b,c, Neural ensemble responses, 0.15s (left) and1s (right) after stimulus onset, 
in the two-dimensional space in which the sets of responses to the two stimuli 
are most distinct, for real (b) or trial-shuffled (c) datasets. The blue and red 
crosses denote individual trials (220 trials per stimulus); the green and orange 
lines mark the classification boundaries for real and trial-shuffled data, 
respectively; and the vertical black line in bis the classification boundary for 
diagonal discrimination, whichignores correlations inthe responses of the 
CellS. Woptimaty Wshutfied AN Wgiagonal Fepresent directions normal to the 
classification boundaries. Wenutried = Waiagonal aS the corresponding classification 
boundaries are identical. d, Mean values of d’ (coloured data points; N=5 mice) 
plotted asa function of time after stimulus onset, for the classifiers in b, c. Error 
bars represent the standard deviation. Coloured lines show the d’ values for 
individual mice, computed using the protocol of Extended Data Fig. 5b and 


Origins of information-limiting noise 

To identify why the information saturates, we analysed the neural noise 
structure by finding the principal eigenvectors of the neural noise 
covariance matrix and the mean amplitudes of visual signals encoded 
along each of these eigenvectors. This allowed us to decompose (d’)* 
into asum of signal-to-noise ratios, one for each eigenvector” (Meth- 
ods). Although visual signal amplitudes increase linearly with ensemble 
size, n (Fig. 4a, b), certain noise eigenvalues might also increase withn, 
which could offset the greater signalling capacity ofa larger ensemble 
and cause the information saturation. 

We developed methods to determine the principal eigenvectors of 
the noise covariance matrix without needing accurate estimates of 
its matrix elements—a key distinction from previous analyses”, 
Contravening prevailing thinking, with our approach recordings of 
more cells enable accurate estimates of these eigenvectors and of 
d’ using fewer trials (Extended Data Fig. 10). Asn increased, mean 
ensemble responses to the two stimuli became increasingly dis- 
tinct while staying aligned to the dimensions important for optimal 
decoding (Fig. 4b, c). In real but not in shuffled datasets the noise 
covariance matrix had 2-3 eigenvalues that also increased linearly 
with n (Fig. 4d, e). We examined how these particular noise eigen- 
modes related to the dimensions in which the neural ensembles 
represented visual signals. 


averaged over 100 different randomly chosen subsets of 1,000 cells and 
randomly chosen assignments of trials to decoder training sets and test sets in 
each mouse. d’ values are normalized by those obtained for trial-shuffled data 
(averaged across 0.83-1.11s).e, Same as d but using cumulative decoding, 
which considers the full time-course of the activity of each cell up to time ¢. For 
each mouse, d’ values in e-h have the same normalizations as ind. f, (d’) values 
during the interval 0.83-1.11s from stimulation onset, plotted against the 
number of cells, n, used for analysis. Data points in f-i are averages over 100 
different subsets of cells, and the shading inf, g indicates the standard 
deviation. For real data, (d’)’ values were well fit by the expression 

(d’)? = (d)’suurrtea/(1te x n), (green curves; R? = 0.88 + 0.03 (s.d); 

£=0.0019 + 0.0007; 5 mice), where <is the fit parameter and (d’)* nutrieg iS the 
(d’)? value for ncells ina linear regression to the shuffled data (orange lines). 

g, Sameasf, but for (d’)? values computed using cumulative decoding for the 
interval O-1.11s.h, i, Asymptotic d’ values in the limit of many cells (h) and the 
number of cells at which (d’)’ attains half its asymptotic value (i) determined 
fromcurve fits as inf, g for instantaneous (open boxes) and cumulative (filled 
boxes) decoding. Optimal linear decoders (green) slightly but significantly 
outperformed diagonal decoders (black) (**P<10™; one-tailed Wilcoxon rank 
sum test; N=100 different assignments to decoder testing and training sets 
using all cells recorded in each mouse; dots are mean values from individual 
mice). Boxes cover the middle 50% of values, horizontal lines denote medians, 
and whiskers span the full range of the data. Analyses in d-iare based on 217- 
332 trials per stimulus in each of 5 mice and time bins of 0.275s. 


Inevery mouse the visual signalling dimensions were nearly orthogo- 
nalto the largest noise mode, which therefore had almost no effect on 
coding fidelity even though it was around tenfold greater than any other 
noise mode (Fig. 4e-h; Extended Data Fig. 10). Instead, it was the third- 
largest noise mode that primarily aligned with the visual coding dimen- 
sions and thereby limited coding accuracy (Fig. 4f-h). These properties 
were sometimes seen, to a lesser extent, in the second-largest mode. 
The existence of noise eigenvectors that closely align to the dimen- 
sions used for visual representations and have eigenvalues that grow 
with n explains the information saturation for large n and why there 
was little performance decrement for decoders that did not account 
for correlated noise. Although these inferences rely on Ca” signals, 
not electrical recordings, this is unlikely to affect the conclusions, 
as variability in how spikes produce Ca” signals arises mainly from 
fluctuations in Ca” levels, photon emission and detection, which are 
statistically independent across cells and are not information-limiting. 

Akey question is how does information-limiting noise arise. Recent 
work examines this issue in a two-layer, feedforward network model with 
sensory inputs and intrinsic noise in both its input and its output layers”. 
As more cells are added to the output layer, the encoded information 
approaches a plateau, the value of which depends on the noise levels 
and synaptic weights” (Extended Data Fig. 1j-m). Our re-analysis of 
this model” revealed that the dimensionality of the space of receptive 
fields inthe output layer equals the number of noise covariance matrix 
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Fig. 4| The largest noise mode is orthogonal to the dimensions encoding 
sensory information. a, Schematics of trial-to-trial variability inensemble 
neural responses with increasing numbers of cells, n. Ellipsoids represent 1s.d. 
fluctuations around mean ensemble responses, r(s), to two similar stimuli 
parameterized bya variables (here the stimulus orientation). For large n, 
response variability along the tuning curve, r(s), increases proportionally to 
the separation, Ap, between the two mean responses, leading toa saturation 
of d’.wis the normal to the optimal linear classification boundary between the 
two response sets. e, ,, are three eigenvectors, e,, of the noise covariance 
matrix, averaged across both stimuli. The eigenvalues, A,, are the noise 
variances along each eigenvector. b, Mean values of (Ap)’ plotted against n for 5 
mice in units of the variance in the shuffled datasets, which have isotropic noise 
covariance matrices. Analyses in b-h used instantaneous decoding in the five- 
dimensional space found by PLS analysis and 100 different randomly chosen 
subsets of cells and assignments of trials to decoder training sets and test sets. 
Given these 100 sets of results, lines and shading in b-f denote mean +s.d., 

g shows 100 individual results, and hhas box plots. c, Cosine of the angle 
between Apand wplotted against n, for 5 individual mice in real and trial- 
shuffled datasets. Because Apis nearly collinear with w, optimal linear 
decoding—which accounts for noise correlations—only modestly outperforms 
diagonal decoding, which does not (Fig. 3h). d, Eigenvalues, A,, for the 
eigenvectors best-aligned with Ap in 5 individual mice (green lines) increase 
linearly withn, revealing the origin of information-limiting correlations. For 
trial-shuffled data (25 orange lines, 5 eigenvalues for each of 5 mice), the noise 
variance along Apis independent of nand is uniform for all eigenvectors of the 
noise covariance matrix. e, f, The geometric relationships between visual 
signals and noise indicate that the largest noise mode is not the one that is 
information-limiting. Each colour denotes a different eigenvector, e,, of the 
noise covariance matrix in the reduced five-dimensional space, a€{1,2,3,4,5}. 
In each individual mouse (e) there were multiple eigenvalues, A,, of the noise 
covariance matrix that increased with n. Extended Data Fig. 10 shows results 
for all mice. Visual signals (f) also increased with n, as shown by decomposing 
Apinto components along the five eigenvectors, e,. In each mouse the 
eigenvector with the largest eigenvalue, e, was the least well aligned with the 
visual encoding direction, Ap (compare the red curves ine, f).g, A plot of noise 
values computed as ine against signal values computed as inf, using all 
recorded cells from mouse 1. The largest noise mode (red points) is an order of 
magnitude greater than the noise modes that limit neural ensemble signalling 
(green and yellow points), yet it is the least aligned with the signal direction. 

h, Signal-to-noise ratios for all five eigenvectors, computed using the values 
ing. (d’)? equals the sum of these five signal-to-noise ratios. Boxes cover the 
middle 50% of values for the same 100 data subsets used in e-g, horizontal lines 
denote medians, and whiskers span 1.5 times the interquartile range. Analyses 
in b-hare based on 217-332 trials per stimulus in each of 5 mice. 


eigenvectors for which the eigenvalues increase linearly with the number 
of output cells (Appendix). This shows that information-limiting correla- 
tions arise even in rudimentary networks, and reflect the co-propagation 
of signals and noise through the same synaptic connections. 
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Discussion 


Our findings address longstanding questions about how the brain com- 
putes accurately despite neural noise’, and help to resolve a30-year-old 
puzzle by providing direct evidence that correlated noise limits cortical 
coding accuracy” *. These results adjudicate against models in which 
noise correlations do not limit—or even improve—cortical ensemble 
coding”’. Encoded visual signals in our recordings were orthogonal to 
the largest noise eigenmode, enhancing coding accuracy by restricting 
~90% of noise fluctuations to dimensions that did not impede signal- 
ling. This strategy allows cortical codes to evade a majority of noise, 
although coding fidelity is ultimately bounded by the weaker correlated 
noise patterns that cannot be disambiguated from signal. (This strat- 
egy might not apply to sensory variables, suchas full-field luminance, 
that animals rarely use for fine discriminations.) In support of these 
conclusions, mouse visual acuity measured using stimuli similar to 
ours is around tenfold better than would be predicted from the 
total noise amplitude in the visual cortex, but fits with the amplitudes 
of the information-limiting noise modes. 

Nevertheless, rigorous comparisons between the accuracies of sen- 
sory cortical coding and psychophysical discriminations will require 
concurrent evaluations in individual animals, using identical stimuli. 
Visual stimuli of greater size can increase a’ values™ by decreasing 
the mean level of shared inputs among responsive cells and thereby 
reducing €, whereas stimuli of greater saliency should increase d’ by 
increasing /). The recent history of sensory stimuli will also influence 
d’ owing to sensory adaptation. Although specific values of d’ will vary 
across stimulus types, information-limiting noise correlations and 
the saturation of information for large n arise generically from the 
propagation of signals and noise through common circuitry and place 
fundamental constraints on coding accuracy. Therefore, our experi- 
mental results likely reflect basic attributes of hierarchical networks 
and should generalize to diverse stimuli and sensory modalities. 

The brain probably cannot learn its own correlated noise structure 
to decode sensory features optimally, as any particular sensory scene 
almost never repeats precisely. Nonetheless, decoders that ignore 
noise correlations can still be near optimal (Fig. 3d, e, h, Extended Data 
Fig. 9c), as predicted for large networks with information-limiting 
noise correlations”. Therefore, information-limiting cortical noise 
might help downstream circuits to readout diverse sensory features 
nearly optimally. 

Future work should extend our experiments to different stimuli, 
sensory modalities and behavioural conditions. Together with our 
analyses tailored for large-scale recordings, microscopes that image 
multiple brain regions concurrently* °* will enable studies of noise 
correlations and information flow across successive cortical areas. Such 
measurements will help to address longstanding questions about the 
decoding strategies that the brain uses for perception, and the effect 
of attention on perceptual sensitivity and neural ensemble noise. 
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Methods 


Microscope design 

We used a systems-engineering approach to design the two-photon 
microscope. To simulate its optical performance and assess component 
suitability, we used optical design software (ZEMAX) to simulate both 
ray and wave propagation through the optical pathway. To validate the 
multiplexing strategy (Extended Data Fig. 2b-d) and the computational 
un-mixing of crosstalk between image tiles (Extended Data Figs. 3c-e, 
4a-c), we simulated fluorescence scattering in brain tissue using the 
non-sequential mode of ZEMAX. We created an optomechanical design 
of the microscope using CREO Parametric 3.0 CAD mechanical design 
software. 


Laser source and control of illumination 

We used an ultrashort-pulsed Ti:sapphire laser (MaiTai eHP DeepSee; 
Spectra Physics) with an 80 MHz repetition rate. We tuned the emis- 
sion wavelength to 910 nm and used the laser’s built-in pre-chirping 
module to attain pulses of 130 + 20 fs duration (FWHM) at the sample 
plane. For general purpose routeing of the laser light to and within the 
microscope we used broadband dielectric mirrors (BB1-E03, Thorlabs). 
Acomputer-driven rotating half-wave (A/2) plate (WP, AHWPO5M-980; 
Thorlabs) controlled the laser beam polarization and hence the power 
transmitted through a polarizing beam splitter (PBS) (PBS102, Thor- 
labs) and into the microscope’s illumination pathway (Extended Data 
Fig. 2d). To block all laser illumination to the microscope during the 
turnaround portion of the fast galvanometer mirror’s scanning cycle, 
we used a custom laser chopper wheel (90:10 duty ratio), positioned 
after the PBS and synchronized in frequency and phase with the fast- 
axis galvometer cycle. 


Multiplexing of the 16 illumination pathways 
Owing tothe powerful ultrafast lasers that are now commercially avail- 
able, past users of two-photon microscopy have often had more than 
enough illumination power at their disposal but remained limited with 
regards to the imaging speeds and the fields of view that were attain- 
able with a single beam and existing scanning hardware. We therefore 
developed a multi-beam, two-photon microscope that puts the (previ- 
ously) excess laser power to good use, by using multiple beam paths 
that enable the coverage of larger fields of view at faster image-frame 
acquisition rates. The Supplementary Note, Extended Data Fig. 2j, k 
and Supplementary Fig. 1 quantitatively compare our imaging system 
to other recent approaches to large-scale two-photon microscopy. 
Tosteer laser illumination into four different sets of four beam paths, 
we used three pairs of electro-optic modulators (EOM) (LM0202 3 x 3 
mm 5W, LIV20 pulse amplifier; QlOptic) and PBS cubes (PBS102, Thor- 
labs) (Extended Data Fig. 2d). We drove each EOM with a high-voltage 
(310 V amplitude) square wave oscillation, with the period matched to 
that of the microscope’s pixel clock. When imaging using the 4 x 4 set of 
beams, the square waves driving the second and third EOMs were both 
phase-shifted by 4 period relative to the square wave driving the first 
EOM (Extended Data Fig. 2c). By toggling the beam exiting each EOM 
between the two linear orthogonal polarization states (the transition 
time between polarizations was around 50 ns), these three square-wave 
signals steered the beam from the laser successively into each of the four 
sets of four beam paths (thatis, 16 total), with each set of four illuminated 
for 4 of each pixel clock cycle (Extended Data Fig. 2b-d). Within each 
set, three beamsplitters (IORQOOUB.2 and 1ORQOOUB.4, respectively, 
for S and P polarizations; Newport) divided the beam power equally 
between four different paths corresponding to four non-neighbouring 
image tiles in the 4 x 4 array (Extended Data Fig. 2b). Because the effi- 
ciency of two-photon fluorescence excitation increases as the square 
of the peak illumination intensity, this temporal multiplexing scheme 
enabled fourfold greater fluorescence excitation compared with an 
otherwise identical, 4 x 4 set of beams that were not multiplexed in time. 


Illumination pathways 

Eachofthe 16 beam pathways contained a pair of kinematically mounted 
mirrors, a1:2 telescope implemented using a pair of lenses (AC254-500- 
B-ML, LA1464-B; Thorlabs), and a gimbal-mounted mirror (GMB1/M; 
Thorlabs). The 16 beam paths converged on a 6-mm-diameter, Ag- 
coated mirror mounted on a galvanometer scanner (621SHSM40B 
scanner, 67121SHHJ-1HP driver; Cambridge Technologies). This gal- 
vanometer served as our slow-axis scanner. 

To image the 16 beams striking the first scanning mirror onto an 
identical galvanometer scanning mirror serving as the fast-axis scan- 
ner, we used a pair of telecentric f-theta lenses designed to induce 
minimal group velocity dispersion with ultrashort-pulsed illumination 
(S4LFT0089/094; Sill Optics) ina 1:1 telescope configuration (Fig. 1a). 
Athird, identical f-theta lens and a tube lens (f= 300 mm, G322-372-525, 
Linos) imaged all 16 beams striking the second scanning mirror onto the 
back aperture of the microscope objective. The objective focused the 16 
beamstoasquarearray of 4 x4 foci, whichtogetherscanneda2mmx2mm 
specimen area at image frame acquisition rates up to around 8 Hz. 

Alternatively, to enable image frame acquisition rates up to 20 Hz 
over a2 mm x 2 mm specimen area, we used a resonant galvanometer 
scanner (6SCO8KA040-02Y, Cambridge Technology, 8 kHz, 7mm 
clear aperture) as the fast-axis scanner. The 8 kHz rate of resonant 
line-scanning allowed us to use a data acquisition scheme based on 
line multiplexing instead of pixel multiplexing. In this mode we used 
EOM3 to direct the laser illumination into one of its two optical output 
paths (Extended Data Fig. 1d, phase I and phase IV). During the resonant 
scanner turnaround times, we used EOMI1to redirect the laser illumina- 
tion towards EOM2, the output pathway of which was blocked. During 
both the forward and backward motion of the resonant scanner a set 
of 4 laser beams scanned across a total of 8 image tiles—that is, 2 tiles 
per beam. By using a different set of 4 beams during the forward and 
backward scanning motions, we sampled one image line in all 16 image 
tiles during each cycle of the resonant scanner while using only 8 of the 
16 beam paths. As with the pixel-multiplexing approach, only 4 beams 
were active at any instant in time. 

For the microscope objective lens, we used either an air objective lens 
(Leica, 5.0 x Planapo 0.5 NA; 19 mm working distance; anti-reflection 
(AR) coated for 400-1,000 nm light; transmission >90% at 520 nm, 
>75% at 910 nm) or a water-immersion lens optimized for large-scale 
two-photon imaging” (1.0 numerical aperture (NA) fluorescence col- 
lection, objective (Jenoptik; 2.5 mm working distance). The illumina- 
tion beams underfilled the back aperture of the microscope objective 
lens, leading to an optical resolution of approximately 1.2 umand8 pm 
in the lateral and axial dimensions, respectively, as determined from 
the FWHM values of the microscope’s optical point-spread function. 


Fluorescence collection pathway 

Fluorescence emanating from the sample returned through the objec- 
tive lens, reflected froma dichroic mirror (FF735-Di02-58x82, Semrock) 
and passed through a collection lens (AC508-180-A, Thorlabs) anda 
fluorescence emission filter (FFO2-525/40-25, Semrock). 

The objective and the collection lens project a magnified image of the 
fluorescence fociin the sample. To optimize the efficiency of fluores- 
cence detection, we designed a custom 4 x 4 lens array (4.5mm pitch, 
plano-convex lenslets, custom injection-moulded in poly(methyl meth- 
acrylate) (AR-coated: reflectivity <0.5%, 450-650 nm) that efficiently 
coupled fluorescence emissions into a 4 x 4 array of 3-mm diameter 
(0.5 NA) plastic optical fibres (FF-CK-120, AR-coated, FibreFin) (Fig. 1a). 

To capture the maximum amount of fluorescence near the edges 
of the large field of view, the outer lenslets in the array were slightly 
larger than the others, extending outward from the perimeter of the 
array. Because even the outer lenslets had a maximum numerical aper- 
ture (0.19 NA) much lower than that of the plastic fibres (0.5 NA), this 
lenslet design yielded a theoretical efficiency of >97% for coupling 


fluorescence into the array of 16 optical fibres. The fibre array delivered 
the fluorescence to a set of 16 GaAsP photomultiplier tubes (PMT) 
(H10770PA-40, Hamamatsu). Each 400-mm-long fibre had a specified 
transmission efficiency of >98%, yielding an overall design efficiency 
of >95% for conveying fluorescence into the photomultiplier tubes. 


Optomechanics 

We custom-fabricated the majority of the structural components of 
the microscope at our laboratory’s machine shop using high-strength 
7075-aluminium alloy and computer numeric control machining. We 
used three-dimensional (3D) printing to create a cover for the micro- 
scope objective lens and a mount for the dichroic mirror. The optom- 
echanical components were generally catalogue parts from standard 
vendors, mainly Thorlabs, Newport and Linos. 


Data acquisition electronics 

Owing to the unique multiplexing scheme of our microscope, data 
acquisition differs from that ina conventional two-photon microscope 
(Extended Data Fig. 3a). A major concern was to ensure that the sig- 
nals from each of the four phases per pixel clock cycle were correctly 
assigned. This necessitated sampling the 16 PMTs sufficiently rapidly 
to ensure that the signals corresponding to different pixels and phases 
were not conflated. Hence, we chose a sampling rate of 50 MHz for each 
PMT. Because the duration of each of the four multiplexing phases was 
400 ns, this sampling rate yielded 20 samples per pixel per multiplex- 
ing phase (Extended Data Fig. 3b). 

Toimplement data sampling at this rate, we first converted the photo- 
currents from the 16 PMTs into voltage signals using a set of four trans- 
impedance amplifiers, each with four input channels (SR445A, Stanford 
Research Systems). We then sampled the resulting voltage signals using a 
16-channel, 50 MS/s analogue-to-digital converter (ADC; 14-bit-samples 
encoded in2 bytes) module (NI 5751, National Instruments). The ADC con- 
nected tothe NI FlexRIO field programmable gate array (FPGA) Module 
for PXI Express, which was controlled by a host computer (Win 64-bit, 
2 Intel E5-2630 processors, 32 GB RAM, Lenovo) through a PCle-PXle 
link (NI PXle-7962R, NI PXle-1082 chassis, PXle-PCle8381 link, National 
Instruments) (Extended Data Fig. 3a). For each multiplexing phase, the 
FPGA module summed the digitally sampled values of the photocurrents 
into pixel intensities. All subsequent data manipulations involved only 
the pixel intensities, yielding a total data throughput rate of 60 MBs ‘or 
105 MBs“, forimage frame acquisition at 7.23 Hz or 17.5 Hz, respectively, 
as opposed to the 1.6 GB s' raw data stream. To eliminate any residual 
crosstalk between pixels resulting from the approximately 50-ns switch- 
ing time of the EOMs, the software interface gave the user the flexibility 
to discard the first few samples of each pixel. 


Instrument control 

When imaging in pixel-multiplexing mode, we used ScanImage* soft- 
ware (version 3.8) to generate the analogue signals driving the galva- 
nometer scanners and the digital line-clock and frame-clock signals 
(Extended Data Fig. 3a). Using the clock signals from ScanImage, the 
FPGA module generated signals to drive the EOMs. We created custom 
LabVIEW (National Instruments, version 2012 SPI, 32 bit) code to initiate 
the imaging sessions and control the data acquisition parameters. When 
imaging in line-multiplexing mode, we controlled the instrumentation 
fully using custom software written in LabVIEW. We synchronized laser 
line-scanning and data acquisition by using the clock of the resonant 
scanner as a master clock. 

In both imaging modes, the FPGA module continually transmitted 
to the host computer the imaging data in packets of pixels, combined 
into image lines, via a high-speed direct memory access first-in first- 
out (DMA FIFO) data link. The host computer constructed image tiles 
from the image line data, accounting for the number of photodetec- 
tion channels and temporal multiplexing phases. The computer then 
streamed the image data onto its hard drive (Extended Data Fig. 3a). 


Mice 
The Stanford Administrative Panel on Laboratory Animal Care (APLAC) 
approved all procedures involving animals, and we complied with all of 
the panel’s ethical regulations. We analysed data acquired from 6 male 
and 4 female Ai93 triple transgenic GCaMP6f-tTA-dCre mice from the 
Allen Institute (Rasgrf2-2A-dCre/CaMK2a-tTA/Ai93), which expressed 
the Ca?*-indicator GCaMP6f in layer 2/3 pyramidal cells”. Mice resided 
ona12-hreverse light cycle in standard plastic disposable cages. Experi- 
ments occurred during the dark cycle. All animals in the experiment 
belonged to the same group, so blinding and random assignments 
were neither needed nor feasible. 

For illustrative purposes only, we imaged a single tetO-GCaMP6s/ 
CaMK2a-tTA mouse’*®, which expressed the Ca?*-indicator GCaMP6 s 
inasubset of neocortical pyramidal neurons (Supplementary Video 3). 


Surgical procedures 

At the start of surgery we gave adult mice (12-17 weeks old) buprenor- 
phine (0.1 mg kg“) and carprofen (5 mg kg) and anaesthetized them 
with 1-2% isoflurane in O,. We implanted a glass window within a5-mm- 
diameter craniotomy positioned over the right visual cortical area V1 
and surrounding cortical tissue. The window was a round #1 cover 
glass (5mm diameter, 0.15 + 0.02 mm thickness, Warner Instruments) 
that we attached to a circular steel annulus (1mm thick, 4.9 mm outer 
diameter, 4.4 mm inner diameter) using adhesive cured with ultra- 
violet light (NOA81, Norland Products). To fill the gap between skull 
and glass window we applied 1.5% agarose. We secured the window 
on the cranium with dental acrylic. We also implanted an aluminium 
metal bar atop the cranium, allowing the mice to be head-restrained 
during in vivo brain imaging. For two days after surgery, we gave the 
mice buprenorphine (0.1 mg kg”) and carprofen (5 mg kg“) toreduce 
post-surgical discomfort. Mice recovered for at least one month before 
any imaging experiments began. 


Visual stimulation 

Mice viewed visual stimuli on agamma-corrected computer moni- 
tor (Lenovo LT2323p; 58.4 cm diagonal extent) that was 10 cm away 
from the left eye and spanned around 142° of this eye’s accessible, 
angular field of view. We generated visual stimuli using the psycho- 
physics toolbox libraries of the MATLAB (Mathworks; version 2017b) 
programming environment. Stimuli were sinusoidal drifting gratings 
(spatial frequency, 0.04 cycles per degree; stimulus angular diameter, 
50 deg; drifting rate, 50 deg s+, centred on the left eye’s visual field; 
stimulation duration, 2s; amplitude modulation depth, 100%; screen 
background intensity, 50%; Fig. 2b). During each experiment, we pre- 
sented the gratings at two different angles, +30° or +6° to the vertical, 
inarandom sequence. Between successive stimuli, the monitor was 
uniformly illuminated at the background intensity for a 2-s inter-trial 
interval. To prevent light from the visual stimuli from entering the 
fluorescence collection pathway of the microscope, the stimuli used 
only the blue component of the RGB colour model, which was blocked 
by the fluorescence emission filter. We also placed a colour filter (Rosco, 
382 Congo Blue) onthe monitor screen. The mean luminance fromthe 
stimulus at the mouse eye was approximately 5 x10 photonsmm’s", 
whichis more than two orders of magnitude higher than the transition 
threshold to photopic vision in mice”. 


Imaging sessions 

Toreduce thestress of head restraint, we head-fixed mice ona100-mm- 
diameter Styrofoam ball that could rotate intwo angular dimensions. 
We tracked the movement of the ball with an optical computer mouse. 
Because running or walking is known to alter visual processing in 
rodents”, we ensured that all visual stimulation trials used for analysis 
were those when the mice were passively viewing the video monitor, 
without locomotion, by excluding all trials during which the mice had 
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an ambulatory speed of greater than 0.2 mms“. We imaged the Ca”* 
activity of neocortical layer 2/3 pyramidal neurons, 150-250 um below 
the cortical surface. The pixel clock cycle duration was 1.6 p's, hence 
the pixel dwell time in each of the four multiplexing phases was 400 ns. 
Owing to the ~50-ns switching time of the EOMs, we discarded four 
samples at the start of each phase, removing any crosstalk between 
phases. Across the full duration of each imaging session, fluorescence 
intensities decreased by ~9% owing to photobleaching. The total laser 
illumination power was 280-320 mW, divided evenly amongst the 4 
beams that were active at any instant in time. Hence, each of the 16 
image tiles (each 500 pm x 500 pm in size) received a time-averaged 
power of 17.5-20 mW, for a time-averaged illumination intensity of 
70-80 mW mm”. Previous Ca” imaging studies of layer 2/3 neocorti- 
cal neurons with conventional two-photon microscopy” * have used 
mean illumination intensities of 89-1,800 mW mm. 

For studies in which the visual stimulation comprised moving grat- 
ings oriented at +30°, we used the air objective lens and the pixel-mul- 
tiplexing approach to image acquisition. We acquired images with 
1,024 x 1,024 pixels at a 7.23 Hz frame rate across the 2mm x 2mm field 
of view using the air objective lens. The total imaging duration per ses- 
sion was 2,800 s (about 20,000 two-photon image frames), resulting 
in 700 visual stimulation trials, 350 for each of the two visual stimuli. 

For studies in which the moving grating stimuli were oriented at +6° 
to vertical, we used the water-immersion objective lens and line-mul- 
tiplexing to acquire images with 1,728 x 1,728 pixels at 17.5 Hz across 
the 2mm x 2 mm field of view, which we averaged and downsampled 
on the FPGA module to 864 x 864 pixels (Extended Data Fig. 9a-c, e, 
Supplementary Videos 2, 3). The total imaging duration per session 
was around 1,500s. 


Image reconstruction 

We wrote custom MATLAB (Mathworks; version 2017b) scripts to 
manipulate the experimental datasets directly from the computer 
hard drive, without loading all the data into the computer’s random- 
access memory. 

The first step of image reconstruction accounted for the differences 
in the gain values of the 16 PMTs. We determined the gain values by 
imaging a static fluorescence sample and then analysing the statistics 
of the photon shot-noise limited fluorescence detection. Specifically, 
we performed a linear regression between the mean signal from each 
PMT andits variance. Inthe shot-noise limited regime, the slope of this 
relationship equals the combined gain of the PMT, pre-amplifier and 
ADC. Knowledge of the pre-amplifier and ADC gain values enabled us 
to determine the PMT gain. Given these empirically determined PMT 
gain values, the first step of image reconstruction was normalization 
of the fluorescence signals from each PMT channel by its gain. 

The second step in image reconstruction was un-mixing of the 
crosstalk between the different PMT channels (Extended Data Fig. 3). 
In principle, when using laser-scanning microscopes with multiple 
illumination beams, one can apply to the set of PMT signal traces an 
un-mixing matrix that represents the inverse of a pre-calibrated, empiri- 
cally determined matrix of crosstalk coefficients between the different 
photodetection channels**. However, this approach assumes that the 
biological sample is uniform and hence that a single un-mixing matrix 
will apply equally well across the entire specimen. In practice, brain tis- 
sue is not optically uniform, and it is challenging to precisely determine 
the crosstalk matrix in image sub-regions with low fluorescence levels, 
suchas in blood vessels. Furthermore, two-photon neural Ca” imaging 
routinely involves modest signal-to-noise ratios and consequently the 
application of the inverse crosstalk matrix introduces additional error, 
analogous to the errors introduced by deconvolution methods when 
applied to weak signals. 

For these reasons, we used a more straightforward, conservative and 
computationally efficient method of image reconstruction. Because 
crosstalk was only present in our microscope near the boundaries 


between image tiles, for each of the four sub-frames per image we 
computationally reassigned the signals from the boundary regions 
between tiles to the nearest neighbour source tile from which the cross- 
talk signals originated according to Extended Data Fig. 3c. We empiri- 
cally determined that boundary regions 50 pixels wide contained ~75% 
of the scattered fluorescence photons from each laser focus. Hence, 
computational re-assignment of the photons from these boundary 
regions enabled conservative estimates of cells’ fluorescence signals, 
near continuous stitching of the images (Extended Data Fig. 3d, e), 
and high-fidelity extraction of neural activity (Extended Data Fig. 4). 

Beyond each 50-pixel-wide boundary region, there were generally 
residual scattered fluorescence photons. Thus, for purposes of visual 
display only (Fig. 1b, c; Supplementary Video 1), we removed boundary 
artefacts left over after computational re-assignment (Extended Data 
Fig. 3c) by parameterizing the boundary with a smoothly decaying 
function: 


sigmoid(x) = a 
1+ el'a*) 


where x is the distance from the tile edge, d= 70 pixels is the width of 
the boundary region, and a= 25 pixels characterizes the smoothness 
of the boundary decay. 


Image pre-processing 

After image reconstruction, each dataset comprised 16 videos, each 
256 pixels x 256 pixels x 21,000 frames for a typical experiment, cor- 
responding to the 16 tiles of each image frame. To correct for lateral 
displacements of the brain during image acquisition, we applied a 
rigid image registration algorithm (Turboreg™; http://bigwww.epfl. 
ch/thevenaz/turboreg/) to each of the individual video tiles. We chose 
this approach because the application of a single, rigid image regis- 
tration algorithm over the entire 2mm x 2 mm field of view did not 
account for variations in tissue motion between the different image 
tiles. After image registration, for display purposes only we merged 
the 16 motion-corrected video tiles into images or videos of the entire 
field of view (Supplementary Videos 1-3). We performed all further 
analysis on individual tiles. 

For display purposes only (Supplementary Video 2, 3), to minimize 
stitching artefacts during video playback we applied to each image 
frame alinear-blending stitching algorithm**°. We then computation- 
ally corrected the movie for lateral displacements of the brain by using 
a piecewise rigid image registration algorithm”. To highlight the details 
for viewers using atypical computer monitor, we saved the processed 
video using a contrast (y) value of 0.75. 


Computational extraction of neural activity traces 
To identify individual neurons in the Ca”* imaging data, we separately 
analysed the 16 individual video tiles in each movie and applied an 
established algorithm for cell sorting based onthe successive applica- 
tion of principal component and independent component analyses**“8 
(Mosaic software, version 0.99.17; Inscopix). We visually screened the 
resulting set of putative cells and removed any that were clearly not 
neurons (about 50% of candidate cells were removed). For the resulting 
set of cells, we created a corresponding set of truncated spatial filters 
that were localized to the cell bodies by setting to zero all pixels in the 
filter with values <5% of the peak amplitude of the filter. After threshold- 
ing, we removed any connected components containing less than 30 
pixels. To obtain traces of neural Ca” activity, we applied the truncated 
spatial filters to the (F(t) — F,)/Fy movies (Extended Data Fig. 5), where 
F(t) denotes the time-dependent fluorescence intensity of each pixel 
and F, is its mean intensity value, time-averaged over the entire movie. 
For each cell, we used fast non-negative deconvolution to estimate the 
number of spikes fired in each time bin”. We then temporally down-sam- 
pled twofold the resulting traces by summing the estimated numbers 


of spikes in pairs of adjacent time bins, yielding time bins of 0.276 ms. 
We performed all subsequent analysis on the down-sampled traces. 

Moreover, previous work has shown that the activity of mouse visual 
cortical neurons differs substantially between behavioural states of pas- 
sive viewing and viewing during active locomotion?®”». To ensure that 
all visual stimulation trials used for analysis were those when the mice 
were passively viewing the video monitor, we excluded from analysis all 
trials during which the mice were running or walking (at speeds greater 
than 0.2mms’). The resulting set of trials retained for data analysis in 
each mouse was 217-332 for each stimulus condition, except for the 
analysis of Extended Data Fig. 9a-—c, e, which involved 122-167 trials 
per stimulus condition. 


Trial-shuffled datasets 

To create trial-shuffled datasets, we randomly permuted the activ- 
ity traces of each cell across the full set of trials in which the same 
stimulus was presented, using a different random permutation for 
each individual cell. Thus, the trial-shuffled datasets preserved the 
statistical distributions of each cell’s responses to the two stimuli, 
but any temporally correlated fluctuations in different cells’ stimulus- 
evoked responses were scrambled. For analyses of trial-shuffled data, 
we averaged results over 100 different randomly chosen subsets of cells 
and/or stimulation trials, each of which was trial-shuffled with its own 
distinct permutations; exceptions to this statement are the analyses 
of Extended Data Figs. 8c-h, 10a, b, for which we averaged results over 
30 such calculations instead of 100. 


Noise correlations in the visual stimulus-evoked responses of 
pairs of cells 

To compute correlation coefficients for the noise in the visual responses 
of a pair of neurons, we first integrated the estimated spike count of 
each cell between [0.5 s, 2s] from the start of visual stimulation. After 
separating the trials for each of the two visual stimuli, we subtracted 
from each trace the mean stimulus-evoked response of the cell and 
then calculated the Pearson correlation coefficient, r, for the resulting 
set of responses from the two cells. We then averaged these noise cor- 
relation coefficients over the two stimulus conditions. Figure 2d, e and 
Extended Data Fig. 6e, g show statistical distributions of the resulting 
mean correlation coefficients across many cell pairs. 

We compared the statistical distributions of mean correlation coef- 
ficients for two different sets of cell pairs, those with positive and those 
with negative covariance of their mean stimulus responses (thatis, cell 
pairs with similar or dissimilar visual tuning) (Extended Data Fig. 6e, 
g). To visually highlight the differences between the two distributions 
(Fig. 2e), we also analysed only the most responsive cells, defined as 


those cells with the top 10% values of ./ Gy + try , where ry and fp are 
the mean responses to the two stimuli. 


Dimensionality reduction and computation of d’ for neural 
responses to visual stimuli 

To estimate how much information the neural activity conveyed about 
the stimulus identity, we used the metric d’, which characterizes how 
readily the distributions of the neural responses to the two different 
sensory stimuli can be distinguished*°. The quantity (d’)’ is the dis- 
crete analogue of Fisher information®’. We evaluated three different 
approaches to computing d’ values for the discrimination of the two 
different visual stimuli (Fig. 3). 

In the first approach, which we termed ‘instantaneous decoding’ 
(Fig. 3d, f, Extended Data Figs. 7a, 9a), we chose for analysis a specific 
time bin relative to the onset of visual stimulation. To examine the 
time-dependence of a’, we used the instantaneous decoding approach 
and varied the selected time bin from t= 0s to t=2s relative to the 
start of the trial. The number of dimensions of the neural ensemble 
activity evoked in response to the visual stimulus was N,, the number 


of recorded neurons (N, « 1,500). Said differently, the set of estimated 
spike traces provided an N,-dimensional population vector response 
to each stimulus presentation. 

In the second approach, termed ‘cumulative decoding’ (Fig. 3e, g, 
Extended Data Figs. 7b, 9b), we concatenated the responses of each 
neuron over time, from the start of the trial up to a chosen time, ¢. In 
this case, the dimensionality of the population activity vector was 
N,N, where N, isthe number of time bins spanning the interval [Os, ¢]. 

Inthe third approach, termed ‘integrated decoding’ (Extended Data 
Fig. 7c), we examined the neural ensemble responses integrated over 
the interval from [0s, 2s] relative to stimulation onset. In the plots of 
d’ against time as computed by instantaneous decoding, the interval 
[0.5s,2s]is when the d’ values have already reached an approximate pla- 
teau (Extended Data Fig. 7e). With integrated decoding, the dimension- 
ality of the population vector response was N,, the number of recorded 
neurons, as in the instantaneous decoding approach. 

In each of the three decoding approaches, we arranged the traces of 
estimated spike counts into three-dimensional data structures (number 
of neurons x number of time bins x number of trials), for each of the 
two visual stimuli (Extended Data Fig. 5b). 

A challenge was that calculation of d’ in an N,-dimensional popula- 
tion vector space would have involved estimation of a N, x N, noise 
covariance matrix with over a million matrix elements. Direct estima- 
tion of the covariance matrix would have been unreliable, because the 
typical number of cells per dataset, N, ~ 1,500, was much larger than 
the typical number of trials P= 600. This issue was even more severe 
in the case of cumulative decoding, for which the population activity 
vector had N, x N, dimensions. However, we found mathematically 
that by reducing the dimensionality of the space used to represent the 
ensemble neural responses, one can reliably estimate eigenvalues for 
the largest eigenvectors of the noise covariance matrix, which govern 
how well the two visual stimuli can be discriminated based on the neural 
responses (Appendix). 

Our approach to dimensional reduction relied ona PLS discrimi- 
nant analysis”. The PLS analysis enabled us to find the dimensions of 
the population vector space that were most informative about which 
visual stimulus was shown. To determine how many dimensions were 
important for discriminating the two stimuli, we constructed an ortho- 
normal projection operator, which projected the N,-dimensional (or 
N, x N, dimensional) ensemble neural responses onto a truncated set 
of the Np dimensions identified by the PLS analysis as being the most 
informative about the identity of the visual stimulus. 

Inthe reduced space with MN, dimensions, we calculated the (a’)? value 
of the optimal linear discrimination strategy as: 


(doy)? = AH! E AB= AH! Wop: 


where x= F(Z, + Zp) the noise covariance matrix averaged across two 
stimulation conditions, Af = pt, — P,is the vector difference between 
the mean ensemble neural responses to the two stimuli and w,,, = 
~7 Au, which is normal to the optimal linear discrimination hyperplane 
in the response space”. 

To determine the optimal value of N, for these computations of d’, 
we split the data into three sets, each comprising a third of all trials. We 
used the first set to identify the PLS dimensions, the second ‘training’ 
set to find the optimal discrimination boundary defined by w,,,, and 
the third ‘test’ set to estimate the discrimination performance da’. We 
then varied \, and plotted the resulting @’ values for both the training 
and test datasets (Extended Data Fig. 7a-c). 

For all three decoding strategies, we chose Ng =5 for all subsequent 
determinations of d’, because the addition of further dimensions led 
to overfitting, as shown by the increase in discrimination performance 
using the training set and the decline in performance (that is, poorer 
generalization to previously unseen data) using the test set (Extended 
Data Fig. 7a-c). 
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After picking N,=5, for all further computations of d’ we first chosea 
subset of neurons and divided the set of stimulation trials into two groups 
of equal size. We used the first group of trials to conduct the PLS analysis 
and the second group to determine d’ and the eigenvalue spectrum of 
the noise covariance matrix (Extended Data Fig. Sb). To make plots of d’ 
(Fig. 3d-g), we averaged d’ values across 100 different randomly chosen 
subsets of cells, which we analysed independently for every time bin. 
For each subset of cells and every time bin, we randomly split the set of 
visual stimulation trials into two halves, one half for determination of the 
five-dimensional sub-space and decoder training, and the other half for 
decoder testing. In Fig. 3d, e, we kept constant the number of cells per 
subset. In Fig. 3f, g, we varied the number of cells per subset. For instan- 
taneous and cumulative decoders in the experiment with visual gratings 
oriented at + 30°, we used [0.83 s, 1.11s] and [O s, 1.11 s] time intervals, 
respectively (Fig. 3f-i). For the experiment with gratings oriented at + 6°, 
the time intervals used for instantaneous and cumulative decoding were 
respectively [0.70 s, 0.94s] and [0 s, 0.94 s] (Extended Data Fig. 9a-c). 

To determine the asymptotic value of d’ in the limit of many neurons, 
andthe number of cells, n,», at which (d’)’ attains half of its asymptotic 
value (Fig. 3h, i), we performed a two-parameter fit to the growth of d’ 
with increasing numbers of neurons, n: (d’)? = (sn) / (1+ én). We deter- 
mined the asymptotic value of d’ as (s/e)’? and ny, ase". 

To verify that linear decoding is a near optimal decoding strategy, we 
confirmed that the noise covariance matrix Z was stimulus-independent 
inthe reduced, five-dimensional space used to calculate d’ (Extended 
Data Fig. 7f). We found that the matrix elements of the noise covari- 
ance matrix were highly correlated across the two stimulus conditions 
(r: 0.81+ 0.16, mean +s.d., N=5 mice). This indicates that other more 
complex, nonlinear decoding strategies are unlikely to substantially 
surpass the accuracy of the linear strategy, which we further confirmed 
via an analysis of quadratic decoding (Extended Data Fig. 7h). 

We also verified that we had sufficient numbers of visual stimulation 
trials to estimate d’ accurately (Extended Data Fig. 7g). For every mouse, 
d’ approached an asymptote as the number of stimulation trials used 
for analysis was increased; this indicates that beyond a certain point 
the computed value of d’ is insensitive to the number of trials. Moreo- 
ver, we developed an analytic theory describing how the accuracy of 
our estimates of d’ depends jointly on the numbers of neurons and 
experimental trials (Extended Data Fig. 10f-k, Appendix). 

Inaddition to our analyses of real data, we also calculated (dé,u9ed) 
(Fig. 3b-g), the optimal linear discrimination performance using trial- 
shuffled datasets, which we created by shuffling the responses of each 
cell across stimulation trials of the same type. Owing to this shuffling 
procedure, the off-diagonal elements of Z, and 2, become near zero. 

We further calculated the performance of a ‘diagonal’ discrimination 
strategy (Fig. 3b, d, e) that was blind to the noise correlations between 
neurons, using the actual (unshuffled) datasets”. For this sub-optimal 
strategy, (doiagona)” determines the separation of two response distri- 
butions obtained when the vector of decoding weights wis collinear 
with Ap (Fig. 3), which we calculated according to: 


di...) (awe ap)? 
diagonal Ap yxy 7Ap 


where 2, is the diagonal covariance matrix. 


Eigenvalues of the noise covariance matrix 

To examine how the statistical structure of neural noise affects the abil- 
ity to discriminate neural responses to the two different visual stimuli 
(Fig. 4, Extended Data Fig. 10a—e), we expressed (d’)” in terms of the 
eigenvalues A, and eigenvectors e, of the noise covariance matrix Z: 


.e [2 
(dy =ayfrYay=z| Aer ed ) 
a Ng 


which can be viewed as a sum of signal-to-noise ratios, one for each 
eigenvector. Clearly, the eigenvectors well aligned with Ap are the most 
important for discriminating between the two distributions of neural 
responses. Noting that A, equals the noise variance along e,, our data 
revealed noise modes that were well aligned with Ap and for whichthe 
variance increased linearly with the number of cells. The combination of 
these two attributes is what leads to the saturation of d’ as the number 
of cells in the ensemble becomes large (Fig. 4). Notably, our analysis 
also uncovered noise modes with much larger variance that are not 
information-limiting, as they do not align well with Ap. 


Calculation of decoding weights 
We calculated the vector of optimal linear decoding weights, w,,,, in 
the reduced space identified by PLS analysis: 
Wopt=2 AW 

For moving grating visual stimuli oriented at +30°, w,,, was generally 
well aligned to Ap, indicating that correlation-blind decoding per- 
formed near optimally (Figs. 3b, h, 4a, c). This was somewhat less the 
case with moving gratings oriented at + 6° (Extended Data Fig. 9c). To 
assess the contributions of individual cells to the optimal decoder, we 
estimated the vector of decoding weights in the space of all neurons as: 


T. 
Waecoding = et 
opt 

where Tis a transformation matrix from the high-dimensional popula- 
tion vector space, in which the responses of each cell occupy an indi- 
vidual dimension, into the five-dimensional space identified by PLS 
analysis. Starting around 0.4 s after the onset of visual stimulation, 
Waecoding WaS largely time-invariant (Extended Data Fig. 7d). 


L2-regularized regression 

Because our method for computing d’ via PLS analysis involved a dimen- 
sional reduction, we compared the d’ values found with PLS analysis to 
those determined viaa different method, L2-regularized regression”, 
which does not depend on dimensional reduction (Extended Data 
Fig. 8a, b). This form of regression uses a regression vector, b, that lies 
within the high-dimensional space of all ensemble neural activity pat- 
terns, but its lengthis limited by the use of an adjustable regularization 
parameter, k. For each subset of neurons considered, we randomly 
chose 90% of the visual stimulation trials for the determination of b. 
We projected the neural responses from the remaining 10% of trials 
onto the dimension determined by b. We then computed d’ with the 
same formula as used with PLS analysis, except with b replacing Wop, 
the optimal linear discrimination hyperplane. Using this approach, we 
found the maximum value of d’ across all values of k within the range [1, 
10°]. We averaged these maximal d’ values across 100 different subsets 
of neurons and visual stimulation trials (Extended Data Fig. 8a). 


Kullback-Leibler divergence 

To assess the extent to which quadratic decoding might surpass the 
optimal linear decoder, we computed the Kullback-Leibler (KL) diver- 
gence” between the two distributions of ensemble neural responses 
to the two different visual stimuli (Extended Data Fig. 7h). The KL 
divergence is a generalization of d’ to arbitrary distributions and, like 
d’, provides an assessment of the statistical differences between two 
distributions. When the two distributions are Gaussians with equal 
covariance matrices, the KL divergence reduces to (d’)’, and linear 
decoding methods suffice to optimally discriminate between the two 
distributions”. By comparison, for two Gaussian distributions with 
different means and covariance matrices, (d’) is not equivalent to 
the KL divergence, and quadratic decoding methods are required to 
optimally discriminate between the two distributions”. 


Toassess the potential benefits of quadratic decoding, we fit multi- 
variate Gaussians to the two stimulus response distributions without 
assuming they had equal covariance matrices. We computed the KL 
divergence of the response distribution to stimulus A relative to the 
response distribution to stimulus B according to: 


KL jp = 3{wexto + Apd,Ap-N+ (Sena 

where X,, 2, are the noise covariance matrices for the two stimulation 
conditions, Apt= pt, — fi, is the vector difference between the mean 
ensemble neural responses to the two stimuli, and Nis the dimension- 
ality of the response distribution (that is, the number of cells in the 
ensemble). The KL divergence saturated as N increased and was gen- 
erally not much greater than (d’)” (Extended Data Fig. 7h). This result 
was consistent with the finding that the noise covariance matrix was 
similar for the two different visual stimuli (Extended Data Fig. 7f) and 
supported the conclusion that quadratic decoding would achieve little 
performance gain beyond that of the optimal linear decoder. 


Computational studies of the robustness of empirically 
determined d’values 
To verify that our decoding methods were robust to the potential pres- 
ence of effects such as common mode fluctuations and multiplicative 
gain modulation that could increase the trial-to-trial variability of neu- 
ral responses, we compared the d’ values obtained from PLS analysis 
versus L2-regularized regression using computationally simulated 
datasets of neural population responses (Extended Data Fig. 8c—h). 
First, to examine the combined effects of information-limiting cor- 
relations and common mode fluctuations (Extended Data Fig. 8c-f), 
we studied a model of the neural ensemble responses in which the noise 
covariance matrix exhibited information-limiting noise correlations via 
asingle eigenvector, f, the eigenvalue of which grew linearly with the 
number of cellsin the ensemble. In addition to this rank 1 component, 
we included a noise term that was uncorrelated between different cells, 
as well as acommon mode fluctuation, yielding a noise covariance 
matrix with the form 


ee T 
i= o7I+ Ecommond + ef f 


where o* = Lis the amplitude of uncorrelated noise, / is the identity 
matrix,/is arank 1 matrix of all ones, and fis the information-limiting 
direction, a vector that we chose randomly in each individual simu- 
lation from a multi-dimensional Gaussian distribution with unity 
variance in each dimension. The amplitude of information-limiting 
correlations was €= 0.002, approximately matching the level observed 
inthe experimental data. In the model version without common mode 
fluctuations, we Set Ecommon tO Zero. In the version with common mode 
fluctuations, we Set Eommon = 0.02, ten times the value of ¢. We chose 
the difference in the means of the two stimulus response distributions, 
Aun, to be aligned with (Fig. 3a) and to have a magnitude of 0.2, so that 
the asymptotic value of d’ for large numbers of cells approximately 
matched that of the data. We compared the decoding results attained 
with and without the presence of common mode fluctuations in the 
neural responses. 

Second, to study the possible effects of multiplicative gain modula- 
tion (Extended Data Fig. 8g, h), we compared two versions of a model 
in whichthe responses of the V1 neural population either were or were 
not subject to a multiplicative stochastic gain modulation but were 
otherwise statistically equivalent. We modelled the V1 cell population 
as aset of linear Gabor filters (see Appendix section 5). In the version 
with gain modulation, on each visual stimulation trial we multiplied 
the output of the Gabor filter by a randomly chosen factor, uniformly 
distributed between 50-150%, the value of which was the same for 
every cell but varied from trial to trial. 


Estimates of perceptual acuity 

We used the empirical determinations of d’ based on visual cortical 
activity and the parameters of the moving grating visual stimulito 
estimate the minimum perceptible orientation difference between 
the two stimuli. We compared the resulting values to those estimated 
from past behavioural measurements of visual acuity in mice, all 
of which agree well. 

One behavioural study assessed how well three individual mice could 
discriminate the orientations of visual gratings™. The best trained of 
these three mice—that is, the mouse that performed the most sessions 
and had the smallest error barsin the threshold determination—hada 
behavioural threshold for orientation discrimination (4.6° + 0.1°;n=7 
sessions) close to the value estimated from our neural data (4.8°). The 
second mouse hada5.7° + 0.6° threshold (n=4 sessions), and the third 
mouse had athreshold of 6.9° (n=1session). 

Another behavioural study examined visual acuity in 13 mice and 
determined the highest visual spatial frequencies the mice could dis- 
cern’. To compare our results to this study, we used the fact that our 
grating stimulihad alow spatial frequency (0.04 cycles per degree) to 
approximate the perceptual challenge of estimating the grating ori- 
entation as being equivalent to that of estimating the orientation of 
the line of peak illumination intensity over the same viewing diameter. 
In the behavioural study of acuity’, the mice used both eyes to view 
the stimulus, whereas in our studies mice viewed the stimulus with one 
eye, and we recorded neural activity from only one cerebral hemisphere. 
To account for these differences, we posited that neural noise fluctua- 
tions should be nearly independent across the visual streams from the 
two eyes, which would boost d’ values by about a factor of V2 over those 
achievable with one eye. However, our determinations of d’ from neu- 
ral activity concern the discrimination of two distinct visual stimuli, 
which should also increase d’ values by a factor of about v2 over those 
for a single stimulus viewed with one eye. Given these counterbalanc- 
ing factors, we used the d’ values to estimate the highest perceptible 
spatial frequency as f= d’(0)/D sin@, where Dis the diameter of the 
visual stimuli (50 deg; Fig. 2b) presented at orientations of +6. For the 
grating stimuli oriented at +30° to vertical, d’ = 6, yielding f= 0.3 cycles 
per degree. For the grating stimuli oriented at +6°, which are more 
representative of the perceptual threshold, d’ = 2.5 and thus f= 0.48 
cycles per degree, comparable to the value of f= 0.5 cycles per degree 
attained from the behavioural studies at a unity d’ value for the behav- 
ioural performance”. We converted values of finto the minimum per- 
ceptible orientation difference, 26,,,,, between two grating stimuli 
oriented at +6,,,, by using 6,,,,= sin 1(1/Df). This conversion yielded a 
prediction of 6,,;,  2.3° based on the behavioural studies of mouse 
visual acuity”, as compared to Omin = 2.4° based on our neural data. 


Computational simulations of activity in a two-layer neural 
network 

To illustrate that cells whose receptive fields overlap exhibit shared 
noise correlations, we simulated a simple two-layer feed-forward net- 
work of linear neurons, with 14 input neurons and 3 output neurons 
(Extended Data Fig. 1j-m). The neurons in each layer were equally 
spaced along a linear axis. We defined the strengths of the connec- 
tions, w;,, between the input and output neurons such that the receptive 
field profiles of the different output neurons were spatially overlap- 
ping Gaussian functions of the linear separation between each output 
neuron and the input neurons (Extended Data Fig. 1j). 

For the three example cells shown in Extended Data Fig. 1j, the unity- 
normalized overlap between their connection weight vectors was: 
W,: W, = 0.165, WW; = 0.022 and w,-w,= 0.038. The activity of cellsin 
the output layer, r was defined as: r,=[w;: (x + n)], where x is the mean 
activity of the input cells in response to a given stimulus, nis a noise 
term in which each element is Poisson-distributed with mean 0.1, and 
[] denotes rounding to the nearest integer. We simulated the activity 
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of this two-layer network across 10,000 time bins and calculated the 
noise correlation coefficients between three different pairs of output 
neurons. 


Measurements of fluorescence scattering 

To examine the extent of fluorescence scattering between active image 
tiles within one temporal phase of our multiplexed imaging scheme 
(Extended Data Fig. 4d-g), we measured the spatial distribution func- 
tion, P.(x, y), governing the probability that a two-photon excited 
fluorescence photon will exit the cortical tissue surface at a point with 
lateral displacement coordinates (x, y) relative to the laser focus. To 
directly observe the distributions P,(x,y) of scattered fluorescence, we 
built a custom optical setup that used the Ti:sapphire laser beam to 
excite fluorescence in fixed cortical tissue slices from adult GCaMP6f- 
tTA-dCre mice and imaged the resulting distribution of fluorescence 
signals ona scientific grade CMOS camera (Orca Flash, Hamamatsu). 
Owing to the use of an imaging detector inthis setup, the fluorescence 
detection pathway had to be optically corrected for field curvature 
and other image plane distortions, whereas the primary two-photon 
microscope (Fig. 1) had no such requirement. For this reason, our stud- 
ies of scattering used an Olympus XLUMPLFLN objective lens (0.95 NA, 
20x), which provided fluorescence images of ~1.2 mm in width. We 
positioned the laser focal spot on one side of the field of view, so as to 
image scattered fluorescence up to about 1.1 mm away from the focal 
spot (Extended Data Fig. 4e, f). We computed the mean P,(x, y) distribu- 
tion, averaged over 100 different locations of the laser focus in each 
of 3 different brain slices, at tissue depths up to 600 um beneath the 
surface of the slice. To determine the mean cross-sectional distribution 
of fluorescence as a function of the radial distance from the laser focus, 
r=./x?+y”,wealso averaged over all accessible polar angles. To com- 
pute the probability that a fluorescence photon excited in one active 
tile would scatter into an adjacent active tile, we integrated the circu- 
larly symmetric determinations of P(x, y) over the portion of the image 
area yielding this form of crosstalk (Extended Data Fig. 4g). 


Measurements of brain temperature during two-photon brain 
imaging 

To perform temperature measurements in the brains of awake mice 
during two-photon imaging (Extended Data Fig. 2f), we surgically 
prepared GCaMP6f-tTA-dCre mice by performing a 5-mm-diameter 
craniotomy following the same procedures as described above. How- 
ever, before placement of the cranial window, we inserted a flexible 
200-1m-diameter thermocouple probe® (IT24P; Physitemp) into the 
brain, 100-200 pm beneath the dura, within ~0.75 mm of the centre of 
the field-of-view of the microscope. The thermocouple resided within 
a5-mm-long plastic micropipette and extended ~2.5 mm beyond the 
tip of the micropipette. 

Using ultraviolet-light curable glue (Loctite, 4305) and dental 
cement, we affixed the micropipette to the cranium ata shallow angle of 
5° relative to the surface of the cranium. We then placed the glass cranial 
window onto the craniotomy and fixed the windowin place with dental 
cement. The thermocouple probe was connected to a two-channel 
digital thermometer (CL3515R; Omega), which conveyed digitized 
temperature data (10 Hz sampling rate) to acomputer via a USB port. 
We protected the wires of the thermocouple connecting to the digital 
thermometer using a5-cm-long piece of flexible plastic tubing. We then 
commenced concurrent two-photon imaging (17.5 Hz image frame 
acquisition rate) and temperature recordings (Extended Data Fig. 2f). 


Histology 

To check whether in vivo two-photon imaging with the 16-beam instru- 
ment induced any brain tissue damage, we performed immunohisto- 
chemical analyses of post-mortem brain tissue sections (Extended Data 
Fig. 2g-i). We compared positive control tissue sections that we had 


deliberately damaged in vivo with high-power (2,680 mW mm”) laser 
illumination, negative control tissue sections that received no laser illu- 
mination, and experimental tissue sections that had undergone in vivo 
two-photon imaging at the highest intensity levels of laser illumination 
(80 mW mm”) used in this study for tracking neuronal Ca”* dynamics. 

We euthanized and intracardially perfused the mice in all three 
groups with phosphate buffered saline followed by a 4% solution of 
paraformaldehyde in phosphate buffered saline. To allow adequate 
time for expression of HSP70 following exposure to laser illumination™, 
mice inthe positive control and experimental groups were euthanized 
21h after the end of two-photon imaging. We sliced the fixed brain tis- 
sue using a vibratome (Leica VT1000 s) to obtain 100-uM-thick coro- 
nal sections. We immunostained the tissue sections with antibodies 
against glial fibrillary activation protein (1:2,500, rabbit anti-GFAP, 
Sigma HPA056030, Lot C115616) and heat shock protein 70 (1:400, 
mouse anti-HSP, Enzo ADI-SPA-810, Clone C92F3A-5, Lot 01031912) 
and then applied fluorophore-conjugated secondary antibodies (goat 
anti-rabbit-Alexa 594 (Invitrogen, A-11012, Lot 1933366) and goat anti- 
mouse-Alexa 488 (Invitrogen, A-11001, Lot 56881A)). 

We also stained the sections with DAPI (Invitrogen, D1306), which 
labels cell nuclei by binding to DNA. After mounting the brain sections 
onglass slides, we visualized immunofluorescence using an epifluores- 
cence macroscope (Leica, MZFL III) equipped with a plan 1.0x objective 
lens, a solid-state white light engine (Lumencor, Sola SM 5-LCR-VA), 
filter sets for imaging red and green fluorophores (Leica 10450756 and 
10450212, respectively) and a CCD camera (QImaging, 01-QIClick-F- 
M-12). Brain sections from all three groups were imaged under identical 
optical conditions and with the same camera settings. 


Statistical tests 

For comparison of the distributions of noise correlation coefficients 
in Fig. 2e and Extended Data Fig. 6g, we used two-tailed, two-sample 
Kolmogorov-Smirnov tests. In Figs. 2f, 3h and Extended Data Fig. 9c we 
used one-tailed Wilcoxon rank-sum tests. Supplementary Table 1con- 
tains all Pvalues associated with the figures and extended data figures. 


Instrument availability 

With support from the United States National Institute of Neurologi- 
cal Disorders and Stroke, we are currently converting the large-scale 
two-photon microscope (Fig. 1, Extended Data Fig. 2) into a research 
facility that is available to other laboratories and formally overseen 
by asteering committee. Researchers interested in this facility should 
please write to its principal investigator (M.J.S.) for more information. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The data that support the findings of this study are available from the 
corresponding authors upon reasonable request. 


Code availability 


We used open source software routines for image registration (http:// 
bigwww.epfl.ch/thevenaz/turboreg/) and partial least squares analysis 
(https://www.mathworks.com/matlabcentral/fileexchange/18760- 

partial-least-squares-and-discriminant-analysis). Software code for 
extracting individual neurons and their Ca” activity traces from Ca”* 
videos using principal component and then independent compo- 
nent analyses*>“* is freely available (https://www.mathworks.com/ 
matlabcentral/fileexchange/25405-emukamel-cellsort), although for 
convenience we used a commercial version of these routines (Mosaic 
software, version 0.99.17; Inscopix). We wrote all other analysis routines 


in MATLAB (Mathworks; version 2017b). The primary software code 
used to support the findings of the study is available at Zenodo.org 
(https://zenodo.org/record/3593520#.XgWPu-hKg2w). 
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Extended Data Fig. 1| See next page for caption. 


Extended Data Fig. 1| The discriminability of two sensory stimulibased on 
the activity patterns of two or more cells depends on the statistical 
relationship between the mean responses of the cells and their noise 
correlations, which in turn depends on visual neural circuitry. 

a-f, Schematics of the distributions of responses by two cells to two distinct 
stimuliin six different cases. Cyan dots indicate joint responses of the cell pair 
to stimulus 1; orange dots indicate responses to stimulus 2. Ellipses convey the 
shapes of the statistical distributions of the responses to each stimulus. Three 
types of noise correlation are depicted. Inaandd, the two cells have 
statistically independent noise correlations. Inbande, the cells share 
positively correlated noise fluctuations. Inc and f, the cells share negatively 
correlated noise fluctuations. In all six cases, dashed lines indicate optimal 
linear boundaries for stimulus discrimination. The information ina-f is based 
onsimilar plots published previously*"*°. a~c, When both neurons have similar 
stimulus-response properties (for example, as schematized, when both cells 
have asmaller mean response to stimulus 1 than stimulus 2), positively 
correlated noise fluctuations (b) increase the overlap between the two 
response distributions and thereby impair stimulus discrimination, whereas 
negatively correlated noise fluctuations (c) improve stimulus discrimination as 
compared to the case with independent noise fluctuations (a). d-f, When both 
neurons have opposite stimulus tuning (for example, as schematized, when 
neuron 1responds more vigorously to stimulus land neuron 2 responds more 
vigorously to stimulus 2), positively correlated noise fluctuations (e) decrease 
the overlap between the two response distributions as compared to the case 
with independent noise fluctuations (d) and thereby improve stimulus 
discrimination, whereas negatively correlated noise fluctuations (f) impair 
stimulus discrimination by increasing the overlap of the two response 
distributions. g, Cells in visual cortical areas, denoted by red circles, integrate 
signals from earlier stages of the visual pathway, as schematized by the input 
connections to two example cortical neurons. Thus, as visual information 
propagates through neural circuitry, noise fluctuations become correlated 
between cells with similar receptive fields, leading toan upper bound onthe 
amount of information that a neural ensemble can encode. h, Example 
receptive fields for cells ing. Cells in early stages of the visual processing 
pathway have relatively simple receptive fields. Integration of their activity 
patterns leads to more complex visual receptive fields in downstream visual 
areas. Dashed boxes enclose receptive fields (right) for the two example cells 
marked ing, as wellas the receptive fields of cells providing visual inputs (left). 
i, Anetwork’s pattern of synaptic connectivity constrains the dimensionality of 
the activity in downstream visual circuits”. Left, in the early layers of the visual 
pathway, the dimensionality of ensemble activity is about the same order of 
magnitude as the number of photoreceptors. In downstream visual areas, due 
to the extraction of visual features, neural activity is constrained to a manifold 


of lower dimensionality (indicated by the red-shaded manifold in the space of 
all possible photoreceptor inputs). This manifold is determined by the set of 
receptive fields and hence the visual features that the downstream visual area 
detects. Grey ellipses (left) depict the distributions of photoreceptor 
responses to two distinct visual stimuli; after propagating through the visual 
circuitry these distributions are confined to the lower-dimensional manifold 
(red ellipses). Right, for a family of visual stimuli parameterized bya single 
variable, the mean neural ensemble responses lie along a corresponding tuning 
curve. Noise inthe input circuitry propagates to downstream areas and leads to 
noise fluctuations in downstream neurons that are statistically correlated for 
cells with similar receptive fields. This, inturn, implies that the magnitude of 
noise fluctuations along the neural tuning curve becomes proportional to the 
number of cells ina neural ensemble and indistinguishable from the encoded 
visual signals, which also increase in proportion to the number of cells. This 
proportional growth of noise and signal ultimately limits the ability to 
discriminate two visual stimuli. Thus, for neural ensembles with more thana 
certain number of cells, the encoded information reaches an upper bound. 

j, We simulated a two-layer, linear feedforward neural network, to illustrate that 
information-limiting correlations are intrinsic to feed-forward neural networks 
with overlapping receptive fields”. Top, for three example output cells, the plot 
shows the synaptic weights of the inputs from cells in the first layer of the 
network. Bottom, diagram of connections between the two layers of the 
network. Symbols are defined as follows: x is the mean activity of cells inthe 
first layer in response toa given stimulus; nis the noise inthe activity of the 
input cells; ris the activity of the output cells. k, Digitized plots of spike counts 
for simulated activity in the network ofj, for the two example input cells (yellow 
and black) and three example output cells (red, green, blue). The noise traces 
for the input cells came from independent Poisson random processes. External 
inputs to the network selectively drove either the yellow or the black cell, but 
owing to the presence of noise the two cells are occasionally active 
concurrently. I, Frequency plots of pairwise activity levels (rounded tothe 
nearest integer) for pairs of output cells in the network ofj. Yellow and black 
circles denote which of the two corresponding input cells received external 
input. The diameter of each circle denotes the number of time bins witha given 
pair of activity levels in the two cells. 2 values are noise correlation coefficients 
and are larger for pairs of output cells with greater overlap in their receptive 
fields. m, Plot of the distribution of activity responses in the output cell layer, 
for the three example cells coloured green, red and blue inj. Data points are 
coloured either yellow or black, to indicate whether the output activity isa 
response to stimulation of the yellow- or black-coloured cell in the input layer. 
The red plane denotes the optimal linear classification boundary between the 
two stimulation conditions. 
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Extended Data Fig. 2| See next page for caption. 


Extended Data Fig. 2| Spatiotemporal multiplexing of the illumination 
beams permits imaging of large fields of view at fast frame rates without 
thermal damage to brain tissue. a, Computer-assisted design of the 
mechanical layout of the two-photon microscope. Scale bar, 0.5 m. b, Inthe 
pixel multiplexing mode of imaging, each of the 16 beams are assigned to one 
of four different temporal phases within each cycle of the pixel clock (Extended 
Data Fig. 3b). Alternatively, in the line-multiplexing mode of imaging, only 8 of 
the 16 beam paths are used (Methods). Inneither imaging mode are 
neighbouring beams ever active concurrently (Extended Data Fig. 3c), 
minimizing fluorescence scattering between active image tiles and allowing 
scattering into inactive image tiles to be corrected computationally (Extended 
Data Figs. 3d, e, 4a-g).c, To switch between the different sets of active beams, 
square-wave electronic signals control aset of three electro-optic modulators 
(EOMs).d, A Ti:sapphire laser provides ultrashort-pulsed infrared illumination. 
Ahalf-wave (A/2) plate anda polarizing beam-splitter enable power control. 
Three pairs of EOMs and polarizing beam-splitters direct the light into one of 
four main optical paths, with only one path illuminated during each of the four 
multiplexing phases. In each of these four main paths, three 50:50 beam- 
splitters create four beams of equal intensity, yielding up to 16 total beams but 
with only four onat any instant. A chopper blocks all light during the 
turnaround portion of the galvanometer scanning cycle. e, Seventy-five 
example fluorescence traces of Ca” activity in layer 2/3 pyramidal cells ofan 
awake mouse. f, Maintaining brain temperature within physiological ranges 
during in vivo two-photon imaging requires a proper balance between heat loss 
through the cranial window and heating induced by the laser illumination®*°. 
To directly verify that our cranial window preparation and imaging conditions 
properly balanced these two opposing effects, we measured brain temperature 
during two-photon imaging with the 16-beam microscope. For these studies we 
used an implanted thermocouple” and either the highest (blue trace) or lowest 
(green trace) time-averaged laser illumination intensity used for Ca”* imaging 
elsewhere in this study (Methods). Consistent with previous work, before laser 
illumination commenced the brain temperature was about 9 °C belownormal 
mouse body temperature®, a state that is considered to be neuroprotective. 
By about 100s after the start of imaging, brain temperatures attained steady- 
state values within the physiological range of C57BL/6 mice*’ (grey shaded 
region; 36.3 °C-38.7 °C). Each trace is an average of three bouts of imaging for 
each of three separate mice. Coloured shading denotes thes.d. across the9 
individual measurements acquired at each illumination intensity. 

g-i, Fluorescence immunohistochemical analyses of tissue damage markers. 
To check whether in vivo imaging of brain tissue with the 16-beam instrument 


(4mm? field of view) induced any tissue damage, we immunostained post- 
mortem brain tissue sections using antibodies to two different damage 
markers, glial fibrillary activation protein (GFAP) and heat shock protein 70 
(HSP70), previously identified as indicators of laser-induced tissue damage”. 
Wealso stained the sections with DAPI, which labels cell nuclei. We compared 
positive control tissue sections (g) that we had deliberately damaged in vivo 
with high-power (2,680 mW mm”) laser illumination, negative control sections 
(h) that received no laser illumination, and experimental tissue sections (i) that 
had undergone in vivo two-photon imaging at the highest level of laser 
illumination (80 mW mm”) used in this study for tracking Ca** dynamics in 
neocortical layer 2/3 pyramidal neurons. Together, these analyses verified the 
functionality of the antibodies and revealed no signs of tissue damage from 
two-photon imaging. To image neurons in cortical layers deeper than layer 2/3, 
users have several options for doing so without delivering excess heat to the 
brain (Supplementary Video 3, Supplementary Note). Scale bars, 500 pm. 
Results shown are representative of those from 8 cerebral hemispheres of 4 
different mice.j,k, Comparisons between recent large-scale two-photon 
microscopes”*”°. The performance of alaser-scanning microscope closely 
relates to four main parameters: the scanner speed, image-frame acquisition 
rate, field of view, and pixel size (Supplementary Note). For microscopes that 
use a single laser beam to sweep in two dimensions across the field of view, 
these parameters obey the relationship FOV =d xv xf", where FOVis the field- 
of-view area, dis the spacing between adjacent image lines (or equivalently the 
pixel width along the slow-axis of laser-scanning), vis the speed at which the 
beam is swept across the specimen by the fast-axis scanner, and fis the image- 
frame acquisition rate. By comparison, our approach using four active beams 
leads to an expression for the maximal field of view, FOV=4 x dxuvxf. These 
relationships enable performance comparisons with other recently published 
large-scale two-photon microscopes”*”~. To illustrate, j shows a plot of the 
image-frame acquisition rate against the field-of-view area, given a line spacing 
of d=1.15 um. k shows how the image-frame acquisition rate depends ondfora 
4mm’ field of view. Solid red circles denote the performance of our 
microscope in its line-multiplexing imaging mode using an 8-kHzresonant 
galvanometer (Methods). Black data points denote performance options of 
another large two-photon microscope, which uses pair of laser beams with 
temporally interleaved pulses”, as calculated on the basis of its published 
capabilities. Blue data points and associated blue dashed lines show 
performance options for a third large-scale microscope”, as calculated onthe 
basis of its published capabilities. 
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Extended Data Fig. 3| See next page for caption. 
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Extended Data Fig. 3| Data acquisition and post-processing for two-photon 
imaging with 16 time-multiplexed excitation beams. a, Block diagram of the 
electronics for data acquisition and instrument control. PMT, photomultiplier 
tube; Pre-amp, pre-amplifier; ADC, analogue-to-digital converter; FPGA, field- 
programmable gate array; EOM, electro-optic modulator. b, Computer 
simulation of signal sampling in different stages of the pipeline ina. The ADC 
samples the analogue, pre-amplified and low-pass filtered signals (blue) from 
one of the PMTs ata rate of 5 x 10’ samples per second. Ineach of the four 
temporal phases, the FPGA sums the digitized signals (red) from the ADC to 
yield the fluorescence intensity values of each image pixel (grey).c, Raw 
fluorescence images for each of the four excitation phases, acquiredin an 
awake mouse expressing GCaMP¢6f in layer 2/3 cortical pyramidal cells and 
averaged over 100 frames (7.23 Hz acquisition rate). In each of the four phases, 
adistinct set of four PMTs detects most of the fluorescence emissions, creating 


four active image tiles within the 4 x 4 array. (Each of the four PMTs corresponds 
to one of the four laser beams that is active in that phase.) To illustrate, the four 
active tiles within the phase I image are shaded witha different colour (shaded 
large square regions). However, close to the boundaries of each active tile, 
some fluorescence photons are detected by the other 12 PMTs. During signal 
unmixing these photons are reassigned to corresponding pixels in the correct 
adjacent active image tile. For instance, within the phase I image photons 
detected inthe areas outlined in colour (rectangles and small squares) are 
reassigned to the colour-corresponding active tiles. d, Animage compiling the 
four sets of four active image tiles from the panelsinc. e, During signal un- 
mixing, we re-assign scattered fluorescence photons to their correct pixels of 
origin, using the method shown inc, by reassigning the boundary regions of 
128 pixels width. The resulting image is displayed with the mean contrast 
equalized across tiles. Scale bars: c,e, 500 um. 
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Extended Data Fig. 4 | Crosstalk un-mixing procedure for reconstructing 
the full field-of-view enables accurate estimation of neural activity traces. 
a, To quantify the extent of fluorescence scattering across image tiles, we 
acquired images in two distinct configurations that enabled us to distinguish 
fluorescence signals from any crosstalk due to fluorescence scattering across 
image tiles. Using an awake mouse expressing GCaMPé@6f in layer 2/3 cortical 
pyramidal cells, we first imaged with only one active laser beam and its 
corresponding PMT; the other 15 beams were blocked (configuration 1). In this 
configuration, there is no fluorescence scattering into the active image tile 
from the other 15 tiles, only the signals from the active tile. In configuration 2, 
we blocked the beam that had previously been active, unblocked the other 15 
beams, operated the microscope with the normal multiplexing approach, and 
again sampled signals from all 16 PMTs. To estimate the extent of scattering 
into the tile with the blocked beam, we applied the computational un-mixing 
procedure to the rawimage data. To estimate how much scattered 
fluorescence affects cell sorting, we first extracted individual cells and their 
Ca** activity traces from the first dataset, attained in configuration 1 without 
crosstalk. We then summed the images, frame by frame, from the two datasets, 
to create a mock dataset comprising unscattered plus scattered fluorescence 
signals, from which we again computationally extracted cells and their activity 
traces. This enabled a direct comparison between two datasets containing the 
exact same patterns of neural activity, with and without fluorescence 
scattering from other image tiles. b, Activity traces for four example cells, 
enabling comparisons of the Ca” activity traces (top), AF(t)/Fp, and the 
resulting traces of the estimated spike counts (bottom), between the datasets 
with (red traces) and without (black traces) inter-tile scattering. The traces with 
and without inter-tile scattered fluorescence signals are nearly 
indistinguishable by eye. c, Histogram of the ratio of estimated spikes for the 
two datasets constructed ina, for all time bins (0.14 s per time bin) with an 
estimated spike count greater than 0.5. The mean ratio is 1.0 + 0.06 

(mean +s.d.; N= 31 cells). Total number of time bins, 5,865. d-g, Studies of 
fluorescence scattering between the active image tiles in one temporal phase 


(Extended Data Fig. 2b) of the multiplexing scheme used for two-photon 
imaging. Throughout the paper, we corrected computationally for 
fluorescence scattering from active to inactive image tiles within each 
temporal phase of imaging (Extended Data Fig. 3c, Methods). This approach 
neglects the small amount of fluorescence scattering from active tiles to other 
active tiles, which in principle could also be computationally corrected using a 
more sophisticated method than the one we adopted. Hence, we examined 
experimentally the validity of our computational approach and the extent to 
which scattering between active tiles can be justifiably neglected. The 
amplitude of scattering between active tiles (d) varies with the location of each 
laser beam andits proximity toa tile boundary. We used fixed cortical tissue 
slices from adult GCaMP6f-tTA-dCre mice to measure the amplitude of such 
scattering effects when imaging at different depths within brain tissue. An 
image (e) of the spatial distribution of two-photon fluorescence excited 

500 um deep within a tissue slice shows that a majority of scattered 
fluorescence photons exits the brain tissue relatively near to the laser focus. By 
averaging over 100 different laser foci positions in each of 3 different brain 
slices, we determined the mean cross-sectional spatial profiles (f) of scattered 
fluorescence excited at different depths in tissue, as a function of the lateral 
displacement, x, from the laser focus. Profiles are shown normalized to unity at 
x=0. The inset of fshows a magnified view of these cross-sectional profiles for 
x€[-1,000 pm, -500 pn], that is, up tol mm away from the laser focus. We used 
these empirically determined scattering profiles to compute the probability 
(mean +s.d.;N=300 laser focus positions) (g) that a fluorescence photon 
originating in one active image tile would scatter into an adjacent active tile. 
Even when the laser focus is on the boundary of animage tile, this probability 
remains less than 0.02 for all tissue depths < 600 pm. For our studies of layer 
2/3 cortical pyramidal cells in live mice, the probability of a fluorescence 
photon scattering between active tiles is less than 0.01. In conclusion, 
computational corrections for fluorescence scattering that account solely for 
scattering from active to inactive tiles—and neglect scattering between 
different active tiles—are empirically well justified. 
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Extended Data Fig. 5 | Pipeline of offline data processing and procedures for 
reducing the dimensionality of the neural ensemble activity data and 
calculating the decoding accuracy. a, Pipeline of the offline procedures we 
applied to the acquired fluorescence signals to attain traces of neural activity. 
Steps coloured purple involve algorithms that use raw or processed image 
data. Steps coloured yellow involve algorithms that use cells’ spatial filters as 
their input arguments. Steps coloured green involve algorithms that use cells’ 
activity traces as their inputs. Purple steps, starting from the raw 
photocurrents from each of the 16 PMTs (sampled at 50 MHzand assigned to 
individual image pixels corresponding toa400-ns laser dwell time), we 
normalized the photocurrent signals by the gain of each individual PMT, to 
equalize the image intensity scale across the entire image. We then un-mixed 
scattered fluorescence, as shown in Extended Data Fig. 3, and applied animage 
registration routine (TurboReg“) to the videos from the individual image tiles. 
To highlight Ca” transients against baseline fluctuations, we used the fact that 
the two-photon fluorescence increases of GCaMP6 during Ca” transients are 
many times thes.d. of background noise. Thus, we converted the fluorescence 
trace of each pixel, F(¢), into a trace of z-scores, AF(t)/o. Here AF(t) = F(t) - Fy 
denotes the deviation of the pixel fromits mean value, Fy, and odenotes the 
background noise of the pixel, which we estimated by taking the minimum ofall 
standard deviation values calculated within a sliding 10-s window”. After 
transforming the movie data into this AF(¢t)/o form, we identified neural cell 
bodies and processes using an established cell-sorting algorithm that 
sequentially applies principal and independent component analyses (PCA and 
ICA) to extract the spatial filters and time traces of individual cells**. Yellow 
steps, for all spatial filters corresponding to individual cell bodies, we 
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thresholded the filters at 5% of each filter’s maximum intensity and set to zero 
any filter components with non-zero weights outside the soma. To attain neural 
activity traces, we then reapplied the set of resulting filters to the AF(t)/Fy 
movies. Green steps, to estimate the most likely number of spikes fired by each 
cellin each time bin, we applied a fast non-negative deconvolution algorithm to 
the AF/F, trace of the cell*®. For each neuron, we down-sampled (2x) the activity 
traces to time bins of 0.275 s by averaging the values within adjacent time bins. 
To make comparisons across similar behavioural states, we removed all trials 
during which the mouse was moving. b, Neural responses for each visual 
stimulus (A and B) are represented as matrices of siZe Nyeurons * Neriats * Neimebins« 10 
calculate the accuracy of stimulus discrimination, we first randomly chosea 
subset of neurons from the dataset. For decoding using the ‘instantaneous’ 
strategy (Fig. 3, Extended Data Figs. 7-10), we then chose a specific time bin, 
whereas for the ‘cumulative’ decoding strategy we treated all the different time 
bins up toa specific time, t, as independent dimensions of the population 
activity vector. We then split the trials in half, into a training set and atest set, 
each with equal numbers of trials with the A and B stimuli. We took the neural 
activity traces inthe training set and normalized them by thes.d. of the cell’s 
activity about its mean, to create to aset of z-score traces. We then performed 
PLS analysis to identify a low-dimensional basis that well captured the 
separation between the neural responses to the twosensory stimuli. Using the 
activity data in the test set, we applied the same normalization and dimensional 
reduction procedures and values as for the training set. We used the resulting 
distributions of responses to calculate d’ values and the eigenvectors of the 
noise covariance matrix. For each mouse we repeated this entire procedure for 
100 different randomly chosen subsets of neurons. 
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Extended Data Fig. 6|See next page for caption. 
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Extended Data Fig. 6 | Distributions of pairwise noise correlation 
coefficients do not differ significantly between pyramidal neuronsin area 
Vi and higher-order visual areas. a, Anatomical maps of visual cortical 
neurons that responded to each of the two stimuli. For these maps (but for no 
other analyses in the paper), we denoted a cell as responsive to one of the 
stimuli if, in at least one time bin during the 2-s-stimulation period (0.275 s per 
bin), the difference between the cell’s mean response and its mean activity 
trace during the inter-trial intervals was more than twice the sum of thes.e.m. 
values for these two traces. Cells that responded to stimulus A only are shown 
red, those that responded only to stimulus B only are shown blue, those that 
responded to both stimuli are shown purple. b, Mean Ca” responses (AF/F) of 
25 example neurons to the two different moving grating stimuli, oriented 

at +30°. Ca activity traces are shown coloured during the stimulation period 
(marked with light grey shading) and black otherwise. Coloured shading about 
each trace denotes thes.e.m. over 217 trials of each type. The inset shows a 
schematic of the two stimuli, which appeared for 2s per trial and were 
presented inrandom order. c, d, Histograms of the estimated mean spiking 
rates of individuals neurons during visual stimulation (c) and the absolute 
values of the differential responses of the individual neurons to the two visual 
stimuli, |R,— Rel /(R,+R,) (d), where R, and R, denote the mean responses of a 
cell to stimuli A and B, respectively. The distributions of cells’ activity rates and 
preferences for one stimulus over the other were consistent with previous 
studies of rodent visual cortical neurons2®???*>*°, Data shown are for N=8,029 
individual cells from N=5 mice. Error bars ares.d.as estimated onthe basis of 
counting errors. e, Histogram of noise correlation coefficients, r, between 
pairs of layer 2/3 pyramidal neurons, computed as in Fig. 2d, for V1 cell pairs 
(dashed lines) and cells pairs in higher-order visual areas (solid lines). The 
histograms show mean values across the two different visual stimuli for both 
the real neural activity traces, and for trial-shuffled data in which each cell’s 


responses to each stimulus presentation were randomly permuted across the 
set of all presentations of the same stimulus. rvalues were computed on the 
basis of cells’ responses integrated over t=[0.5s, 2s] fromthe start ofeach 
trial. Histogram bin, 0.01. (N=1,331,109 V1 cell pairs from 5 mice; N= 2,428,437 
cell pairs from higher-order visual areas in 5 mice). f, Box-and-whisker plots of 
the mean and FWHM values of the distributions in e (real data only). Both 
statistical metrics are similar for the two classes of visual cortical neurons. 
Open circles denote individual data points for N=5 mice. g, h, Histograms (g) 
and cumulative probability distributions (h) of noise correlation coefficients 
for all cell pairs (based on all recorded V1land higher-order visual cortical 
neurons) with similar or differently tuned mean evoked responses to the two 
visual stimuli. Unlike Fig. 2e, which shows these distributions for only the most 
active cells (the highest decile), here the distributions include all cell pairs with 
either positively (red curves) or negatively (blue curves) correlated mean 
responses to the two stimuli. Within these two groups of cell pairs, we 
computed the noise correlation coefficient, r, for each cell pair. Owing to the 
extremely large number of cell pairs, the two distributions of rvalues differed 
significantly (***P< 107 for all 5 individual mice; two-tailed Kolmogorov- 
Smirnov test; 3,482,186 positively correlated cell pairs in total; 3,464,094 
negatively correlated pairs), even though the effect size was tiny and the two 
distributions were nearly identical. This result shows the difficulty of detecting 
information-limiting correlations by measuring pairwise noise correlations, 
because the variance in the individual r values is much greater than the 
difference between the mean values of the two distributions. i, Box-and- 
whisker plots of the mean values of the correlation coefficients ing, h.Open 
circles mark individual data points for N=5 mice. b-iare based on 217-332 trials 
per stimulus condition in each of 5 mice. Inf, i, boxes cover the middle 50% of 
values, horizontal lines denote medians, and whiskers span the full range of the 
data. 
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Extended Data Fig. 7| See next page for caption. 


2 4 6 8 0O 


d’ calculated with 
the first 50% of trials 


2 4 6 8 


d’ calculated with 
the first 50% of trials 


0 2 4 6 8 


d’ calculated with 
the first 50% of trials 


Article 


Extended Data Fig. 7| Temporal integration of neural activity improves 
decoding performance, but quadratic and linear decoding yield identical 
biological conclusions. a—c, To identify how many PLS dimensions were 
needed to determined’ accurately, we divided data from each of 5 mice into 
three equally sized portions. We performed PLS analysis using trials in the first 
third. Onto the PLS dimensions thereby identified, we projected the neural 
ensemble activity in the second third of the data (training data). We retained 
only the first Nz dimensions of this projection and computed d’ inthe reduced 
space (magenta data points) by identifying a hyperplane for optimal stimulus 
discrimination. Finally, we applied this discrimination strategy to the 
remaining third of the data (test data) and again calculated d’ (grey points). 
Plots show mean values of d’ asa function of Np for the interval [0.83 s, 1.11s] 
from stimulus onset (V=5 mice; error bars denotes.d. across 100 different 
subsets of 1,000 neurons per mouse). We normalized d’ values to that found for 
Nze=Sonthetest dataset. For N,>5, discrimination performance declines 
owing to overfitting for all discrimination strategies: instantaneous (a), 
cumulative (b) and integrated (c). Hence, throughout the rest of the study we 
used N,=5 forall calculations of d’.d, Pearson correlation coefficients between 
the optimal linear decoding weights attained using instantaneous decoding at 
different time bins after the onset of grating stimuli (+30° orientations). These 
weights were highly correlated for different time bins, especially across the 
interval [0.5s, 2s], during whichd’ reaches a plateau. Further, optimal decoders 
for eachtime bin yielded nearly equivalent decoding performance when 
applied to data from other time bins. For instance, the optimal decoder for the 
fourth time bin (t= 0.97 s), when applied to any other of the last five time bins, 
yielded a performance within less than 2% of that of the optimal instantaneous 
decoder in all mice. When applied to the first and second time bins, the decoder 
from the fourth time bin yielded decoding performances that were, 
respectively, 83 +11% and 90 + 3% (mean+s.d.; N=5 mice; 217-232 trials per 
stimulus) of that of the optimal decoders. e, Plots of d’ versus time after 
stimulus onset, for instantaneous and cumulative decoding strategies (Fig. 3). 
For each mouse that viewed gratings oriented at +30°, we chose 100 random 
subsets of 1,000 cells and normalized d’ values by those obtained using a time- 
integrated decoding strategy, which involved optimal linear discrimination 
over one interval, [0.28 s, 1.94 s], covering most of the visual stimulation 
period. Green traces, meand’ values for individual mice using a time bin of 275 
ms. Error bars,s.d. across 5 mice. f, In the five-dimensional space used after 
truncating ensemble neural responses to the five leading PLS dimensions, the 
distributions of noise in the responses to the two stimuli were highly similar. 
Specifically, non-diagonal elements, 2;,, of the noise covariance matrices for 


ij 
the two stimulus conditions were highly correlated (r:0.81+ 0.16; mean +s.d.; 


N=S5 mice), as computed for the interval [0.83 s, 1.11s] after stimulus onset. This 
similarity argues that a linear discrimination strategy to classify the two sets of 
ensemble neural responses is near optimal, as confirmed inh. Values of 2; are 
plotted as mean+s.d., computed across 100 different randomly chosen 
subsets of 1,000 neurons per mouse. g, Using optimal linear decoding, 

d’ values saturated as the number of trials analysed increased. Colours denote 
individual mice. Data points were calculated for the interval [0.83 s, 1.11s] after 
stimulus onset. Error bars, s.d. across 100 different randomly chosen subsets 
of 1,000 cells per mouse and stimulation trials. h, To check whether our results 
depended on our use of linear decoding, we tested whether quadratic decoding 
might yield different conclusions. We examined the KL divergence”, a 
generalization of (d’)? that makes no assumption about the statistical 
distributions under consideration. We computed the KL divergence, which 
equals (d’) for linear decoders, by using Gaussian approximations to the 
distributions of ensemble neural responses to the two different stimuli, and we 
plotted the results as a function of the number of cells, n, inthe ensemble. First, 
to recapitulate our determinations of (d’)* (magenta data points), we computed 
the KL divergence under the assumption the two different response 
distributions had distinct means but identical noise covariance matrices, 
which we estimated as the mean noise covariance matrix averaged over the two 
different stimulus conditions. This is equivalent to computing (d’)?. Next, we 
relaxed the assumption that the two noise covariance matrices were equal and 
computed the KL divergence between the distributions of neural responses to 
stimulus B relative to those to stimulus A (blue points), and vice versa (red 
points) (Methods). For all mice, KL divergence values saturated with increasing 
nand, except in one mouse, were not much larger than (d’) values. Thus, 
quadratic decoders (which are optimal for discriminating two Gaussian 
distributions with different means and covariances) will yield the same basic 
conclusions as linear decoders (which are optimal for discriminating two 
Gaussian distributions with the same covariance matrix). Data points and error 
bars denote mean+s.d. values computed in each mouse across 50 different 
randomly chosen subsets of cells and assignments of visual stimulation trials 
to decoder training and testing (Extended Data Fig. 5b). i, Mean neural 
responses, averaged across all cells, to stimuli A (top) and B (bottom) for the 
first and second halves of the experimental trials in each mouse. Error bars, s.d. 
across the set of trials. j, d’ values computed for each mouse using 
instantaneous decoders trained onthe first half of the trials and tested onthe 
second half (x axis), plotted with d’ values for an instantaneous decoder trained 
onthe second half of the trials and tested on the first half (y axis). a-jare based 
on 217-332 trials per stimulus condition in each of 5 mice. 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | PLS-based decoding methods are robust to 
multiplicative gain modulation and common mode fluctuations inthe 
neural ensemble dynamics and yield identical conclusions to regularized 
regression. a, b, To test whether PLS analysis and dimensionality reduction 
might lead to underestimates of a’, we compared d’ values determined using an 
L2-regularized regression (L2RR) performed in the full space of neural 
responses (a) to those found by PLS analysis (b). The two methods yielded 
similar estimates of d’, which both saturated with increasing numbers of 
neurons. Plots showd’ values (mean +s.d.) for neural responses within [0.83 s, 
1.11s] after stimulus onset, computed across 100 different randomly chosen 
subsets of neurons and visual stimulation trials (Extended Data Fig. 5b). For PLS 
analyses, we used half of the trials in each subset for decoder training and the 
other half for testing. For L2RR we used 90% of the trials in each subset to 
determine the regression vector and the other 10% to determine d’. We varied 
the regularization parameter, k, within [1, 10°] and used the maximum d’ value 
so obtained, as determined independently for each mouse, subset of neurons, 
and subset of trials (217-332 trials per stimulus condition in each of 5 mice). 
c-h, The conclusions of our study depend oncomparisons of decoding 
performance between real and trial-shuffled datasets. Thus, we checked 
whether our PLS-based decoding methods would robustly detect information- 
limiting correlations in models in which such correlations were present but 
weak; avoid reporting information-limiting correlations in models lacking such 
correlations; and be robust to the potential presence of other strong sources of 
neural trial-to-trial variability—such as common mode fluctuations and 
multiplicative gain modulation—even when they make an order-of-magnitude 
greater contribution to neural variability than the information-limiting noise 
fluctuations. We studied these issues using two different computational 
models (Methods). For both models we plotted empirically determined (d’)” 
values as a function of the number of neurons inthe ensemble. We compared 
determinations of (d’)? using PLS-based decoding and those made using L2RR 
to the actual ground truth values of (d’)? in each model. In each panel, the top 
and bottom plots show results for unshuffled and trial-shuffled datasets, 
respectively. Data points and error bars denote mean +s.d. values across 30 
different simulations. To examine the combined effects of information- 
limiting noise correlations and common mode fluctuations (c-f) we studied a 
model of neural ensemble responses in which the noise covariance matrix 
exhibited information-limiting noise correlations viaa single eigenvector f, the 
eigenvalue of which grew linearly with the number of cells inthe ensemble. In 
addition to this rank 1 component, we included a noise term that was 
uncorrelated between different cells, as well as acommon mode fluctuation, 
yielding a noise covariance matrix with the form 2* = 071 + Ecommon! t Ef’ f, where 
o? =1is the amplitude of uncorrelated noise, /is the identity matrix,/is arank1 
matrix of all ones, reflecting acommon mode fluctuation, and fis the 


information-limiting direction, a vector that we chose randomly in each 
individual simulation froma multi-dimensional Gaussian distribution with 
unity variance in each dimension. The amplitude of information-limiting 
correlations was €= 0.002, approximately matching the level observed inthe 
experimental data. We chose the difference in the means of the two stimulus 
response distributions, Ap, to be aligned with f (Fig. 3a) andto havea 
magnitude of 0.2 sothat the asymptotic value of d’ for large numbers of cells 
approximately matched that of the data. We compared decoding results 
attained with and without the presence of the common mode fluctuations in 
the neural responses. In the version of the model without common mode 
fluctuations, we Set common to Zero. In this case (c) both PLS- and L2RR-based 
decoders correctly detected the saturation of information inthe real data but 
not intrial-shuffled datasets. (See Extended Data Fig. 10h, k for theoretical 
results showing how the accuracy of d’ estimates from PLS analysis depends on 
the numbers of neurons and experimental trials in this particular model.) To 
verify that our methods would not incorrectly report an information saturation 
when it was in fact absent, we next set €= 0 and confirmed that inthe absence of 
information-limiting noise correlations (d), neither decoder detecteda 
saturation of information in the real or shuffled data. In the version of the 
model with common mode fluctuations, we set Egommon = 0-02, ten times the 
value of €= 0.002. In this case (e), both PLS- and L2RR-based decoders correctly 
detected the information saturation in the real but notin the shuffled data. To 
verify that common mode fluctuations alone cannot induce anillusory 
saturation of information (f), we set €= 0 while maintaining €,onmon= 0-02 and 
confirmed that neither PLS- nor L2RR-based decoders reported an illusory 
information saturation. Overall, these results indicate that our methods 
accurately detect the presence of weak information-limiting correlations 
buried within common mode noise that can be an order of magnitude larger, 
without falsely detecting information-limiting correlations when they are 
absent. To study the possible effects of multiplicative gain modulation (g, h), 
we compared two versions of a model in which the responses of the V1 neural 
population either were or were not subject toa multiplicative stochastic gain 
modulation but were otherwise statistically equivalent. We modelled the V1 
cell population asa set of Gabor filters (see Appendix section 5). Inthe model 
version with gain modulation, on each visual stimulation trial we multiplied the 
output of each Gabor filter by arandomly chosen factor, uniformly distributed 
between 50%-150%, the value of which was the same for all cells but varied from 
trial to trial. Inthe model version without gain modulation (g) both PLS- and 
L2RR-based decoders detected the information saturation inthe real but notin 
the trial-shuffled datasets. When we added global gain modulation tothe 
model (h) both decoders correctly found the information saturation in the real 
but notin the shuffled datasets. 
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Extended Data Fig. 9 | Moving grating visual stimuli oriented at +6° are 
harder to distinguish on the basis of their evoked neural ensemble 
responses than gratings oriented at +30°, but also reveal the saturation of 
information signalling in large neural populations. a, (d’)’ values determined 
using an ‘instantaneous’ decoder for the interval [0.70 s, 0.94 s] from visual 
stimulation onset, plotted as a function of the number of cells, n, inthe 
ensemble in mice presented moving gratings oriented at +6°. Data points 
represent mean values determined across 100 different subsets of cells, and 
the shading represents s.e.m. As in Fig. 3f, g, we fit the (d’)” values as a function 
of nusing a one-parameter fit, (d’)?=(d’)’hurrieg/(1 + € xn), Where (d’)’huprieg (1) is 
the empirically determined value of (d’)? for the same number of cells in the 
shuffled data, and is the fit parameter. For each mouse, for both real and trial- 
shuffled data we normalized (d’)* values by the value of (@’ snutrieq) for n=1,000 
neurons. Goodness of fit: R? = 0.41+ 0.17 (s.d). N=5 mice. = 0.0021+ 0.0008 
(s.d.), 122-167 trials per stimulus condition for each mouse. b, Sameasa, but 
using the ‘cumulative’ decoding strategy over the [0s, 0.94 s] time interval. 

c, Box-and-whisker plots of the asymptotic values of d’ in the limit of many 
neurons (right) and the number of cells at which (a’)’ attains half its asymptotic 
value (left) as determined from parametric fits to the data of aandb for the 


instantaneous (open boxes) and cumulative (filled boxes) decoding strategies. 
Optimal linear decoders (green data) slightly but significantly outperformed 
diagonal decoders (black data) (**P< 0.0001; one-tailed Wilcoxon rank sum 
test; N=100 different randomly chosen assignments of trials to decoder 
training and test sets in each mouse; 122-167 trials per stimulus condition for 
each mouse; open circles denote mean values from N=5 individual mice). 

d,e, Histograms for the real (unshuffled) and shuffled datasets of the ensemble 
neural responses to each of the two visual stimuli, projected onto the direction 
of the optimal decoding vector determined by PLS analysis, as computed in 
each mouse viewing moving gratings oriented either at +30° (d) or +6° (e), 
using all imaged neurons and the instantaneous decoding approach. Error bars 
denote counting errors. Values on the x axes are plotted for each mouse in units 
of thes.d. ofits neural ensemble responses along the decoding vector for the 
shuffled data. For each mouse, the histograms have approximately equal 
shapes for the two visual stimuli, are unimodal and approximately symmetric 
about their mean values, bolstering the use of linear decoding and d’ This 
analysis involved 217-232 trials per stimulus condition per mouse ind and 
122-167 trials per stimulus condition per mouse ine. 


Mouse 1 Mouse 2 Mouse 3 Mouse 4 Mouse 5 


a 100 —— 100 100 100 100 
Eee 
2 aS 
2 eae 
2 10} pA 10 10 | oar 10 | pees 10 
=) N 
= 3 SaaS SSS 
g =< Ss << ———~ Sa 
a ——— ——— SS SS nd 
ee 1 1 1 1 
2 
0.1 0.1 0.1 0.1 0.1 
0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 
b Number of trials Number of trials Number of trials Number of trials Number of trials 
15 15 15 15 15 
— 10 10 10 10 10 
o tes ig oe 
= ? 
Sy 
5 5 5 5 | wAV 5 
A 
0 0 0 0 0 
0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 
c Number of trials Number of trials Number of trials Number of trials Number of trials 
1 
. 00 = ih, 
a 
@ —), 
2 
g ° = yea SZ (<= Pe ~~ hy 
e =}; 
oO P_ —a (_z= Baz L—_—= P= 4 
> Za _———_ —., 
a —— A — ae a — Z— 
d 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 
Number of neurons Number of neurons Number of neurons Number of neurons Number of neurons 
10 ai 5 5 10 8 
a = 
o ofa LK ve 
2 5 25 oe” 25 2a 5 /— 4 
a Pius fff / 
Ys “4 f oat / 
—— E 
0 0 0 0 0 
e 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 
Number of neurons Number of neurons Number of neurons Number of neurons Number of neurons 
100 40 20 100 40 
= f b 
© . 
2 q 
$ 50 20%. 10 50 F 20 
oO 
D 
i a 
— bie = = 
0 0 0 0 0 
0 100 200 0 20 40 40 0 100 200 0 25 50 
(Awe, (Awe,) (Awe, (Awe,? 
f ra Correlation coefficient of actual and estimated quantities i Correlation coefficient of actual and estimated eigenvector 
2 4 : T : 2000 
i= ray 0.8 
3 = 0.6 
8 zo) : 
ce 0.5} : J 5 1000 0.4 
2 e Simulated data -— Theory — e Matrix element a 
& € 0.2 
oO gompus tcc CCIE GEA OPIN AOL ALAA] 3 
E 0 t i Zz 0.0 
rs) 0 200 40 600 800 200 400 600 800 
Ratio of actual to estimated eigenvalue ; Ratio of actual to estimated eigenvalue 
g J 
4 T T T 4 2000 0.8 
Oo 
2 0.6 
2 to) 
& 05; i 5 1000 0.4 
° Simulated data — Theory z 0.2 
= 
0 i i i Zz 0.0 
0 200 400 600 800 200 400 600 800 
h Ratio of estimated to actual Fisher information k Ratio of estimated to actual Fisher information 
1 - : o 
o 
oO 
& 08+ xe} 
© © Simulated data — Theory 3 
0.6} E 
i i i Zz 
0 200 400 600 800 200 400 600 800 
Number of trials Number of trials 


Extended Data Fig. 10 | See next page for caption. 


Article 


Extended Data Fig. 10 | Hundreds of experimental trials sufficed to estimate 
the statistical structure of signals and noise in visual cortical coding. 

a, b, PLS analysis represents ensemble neural responses in alow-dimensional 
subspace that helps for understanding visual discrimination (Fig. 4).Onthe 
basis of Extended Data Fig. 7a—c, computations here used the five most 
informative PLS dimensions. Each column shows results from an individual 
mouse that viewed gratings oriented at +30° (217-332 trials per stimulus). Each 
colour denotes a different eigenvector, e,, of the noise covariance matrix inthe 
five-dimensional subspace. « denotes the dimension index, {1,2,3,4,5}. As 
illustrated in Fig. 4e, each mouse had multiple eigenvalues, A,, of the noise 
covariance matrix that increased with the number of cells, n, used for analysis. 
As shownin Fig. 4f, visual signals—defined as the mean separation, Ap, between 
the two response distributions—also increased withn. a, bshow eigenvalues A, 
(a) and signal components |Ap- e,| (b) plotted against the number of trials 
analysed. Both signal and noise estimates plateau, indicating that there were 
sufficient trials to accurately estimate signal and noise structure in the reduced 
five-dimensional space. Throughout a-d, lines and shading denote mean +s.d. 
across 100 different randomly chosen subsets of cells and assignments of trials 
to decoder training and testing, except ina, b we used all cells from each 
mouse and 30 different assignments of trials. c,d, The statistical relationships 
between visual signals and noise show the largest noise mode is not 
information-limiting. Each mouse had multiple eigenvalues, A,, of the noise 
covariance matrix (c) that increased with n, the number of cells. Visual signals 
(d) also increased with n, as shown by decomposing Ap into components along 
the five eigenvectors, e,. In every mouse the eigenvector with the largest 
eigenvalue, e, was the least well aligned with the signals, Ap (compare red 
curvesinc, d).e, Plots of noise values, computed as inc, versus signal values, 
computed as ind, based onall recorded neurons from each mouse and the 
same 100 subsets of data used inc, d. The largest noise mode (red points) was 
generally an order of magnitude greater than noise modes that limited neural 
ensemble signalling (green and yellow points). f-k, Ina-e and throughout 
much of the paper, we analysed populations of up to 2,191 neurons using 217- 
332 trials with each stimulus, which sufficed to accurately determine the Fisher 
information, (a’)’, and principal eigenvectors of the noise covariance matrix 
(Fig. 4). By comparison, there were insufficient trials to accurately determine 
noise covariance matrix elements—that is, noise correlations between cell 
pairs (Fig. 2d). To explain this, we derived the accuracy with which d’ and 
principal noise covariance eigenvectors and eigenvalues can be estimated 
through PLS analysis of recordings of n neurons across Ptrials, using the 
computational model of Extended Data Fig. 8c (Appendix section 6 has 


derivations of results in f-k). The central idea, illustrated inf, is that onecan 
estimate accurately the principal noise covariance eigenvector, because it has 
alarge eigenvalue, A, that grows linearly with n (A=cn, where cis aconstant). 
The theory predicts that the correlation coefficient, C, between estimated and 
actual eigenvectors is given by C= foe forc?Pn>1. Otherwise, C=0. 
fshows predictions for C (black curve) versus the number of trials, P, for 
n=2,000 and c=0.005. We chose this c value to fall within the lower range of 
growthrates for experimentally determined eigenvalues, c. The predicted C 
values match those describing the accuracy (red points) with which we could 
estimate the principal noise covariance eigenvector in the computational 
model. However, correlation coefficients (blue points) between estimated and 
actual individual elements of the noise covariance matrix were unsatisfactory, 
even with 800 trials. ishows predicted values of Cas ajoint function of nandP. 
Iso-contours of Care hyperbolic, revealing a tradeoff such that recording more 
cells enables accurate estimation of noise eigenvectors using fewer trials. We 
also derived how accurately one can estimate eigenvalues of the noise 
covariance matrix, as quantified using the ratio, R, =A/AwhereA=cnis the 
actual eigenvalue in the model and A is the estimate based on Ptrials. The 
theory predicts R, = {when c?Pn>1; otherwise we set ®, = 0, because we 
cannot accurately estimate the corresponding eigenvector whenc2Pn<1.g 
plots predictions of 8, (black curve) versus P (for n=2,000 cells andc=0.005), 
which match the accuracy with which we estimated the model eigenvalues from 
simulated data (red dots).j shows R, predictions as ajoint function ofn and P. 
Wealso studied how well one can estimate the Fisher information, (d’)’, via PLS 
analysis of data with fewer trials than recorded neurons. We examined the ratio, 
R, of the d’ estimate to its actual value using the model and simulated data of 
Extended Data Fig. 8c and found R? = ae where Cas= as fete rin ig the 
predicted correlation coefficient betweerithe PLS regression vector and the 
optimal one. Here As?and € determine the Fisher information inthe model of 


Extended Data Fig. 8c via (d,,,)” = ““S. As in Extended Data Fig. 8c, we used 


€=0.002 to match the growth rate ofa’? in experimental data with increasing 
n,and As?= 0.04 to approximate the magnitude, a , of (d’) inthe data for large 
n.C>,,increases monotonically with Pand n, confirming that PLS regression 
improves as nand Pincrease. As C7, nears 1, so does ®’, indicating that PLS 
analysis can accurately estimate (d’)2. hshows predictions for ® versus P for 
n=2,000 cells (black curve). The theory matches the accuracy with which we 
estimated (d’)’ via PLS analyses of the simulated model data (red dots). kshows 
predicted ®” values versus n and P. lso-contours of ®? are hyperbolic, 
indicating recordings of more neurons permit accurate estimates of (d’)? based 


on fewer trials. 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
Lo AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection For data acquisition, we used custom routines written in LabView software (National Instruments, version 2012 SP1, 32 bit). For 
instrument control in the pixel-multiplexing acquisition modes, we also used open-source ScanImage software (version 3.8). 


Data analysis We used open source software routines for image registration (http://bigwww.epfl.ch/thevenaz/turboreg/), 
cell sorting, and partial least squares analysis (https://www.mathworks.com/matlabcentral/fileexchange/18760-partial-least-squares-and 
discriminant-analysis). Software code for extracting individual neurons and their calcium activity traces from calcium videos by using 
principal component and then independent component analyses is freely available (https://www.mathworks.com/matlabcentral/ 
fileexchange/25405-emukamel-cellsort), although for convenience we used a commercial version of these routines (Mosaic software, 
version 0.99.17; Inscopix Inc.). We wrote all other analysis software in Matlab (2017b). The primary software code used to support the 
findings of the study is available at Zenodo.org (https://zenodo.org/record/3593520#.XgWPu-hKg2w). 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


The data that support the findings of this study are available from the corresponding authors upon reasonable request. 
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Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size We designed the study such that each of the main results would come from 5 biological replicates, i.e., 5 different mice, for each of 2 
different experimental conditions. The experimental results from all 10 mice were similar, affording confidence in the findings. 
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Data exclusions To minimize false positives during cell sorting, we adopted a conservative approach during manual classification of candidate cells, such that 
we accepted for analysis only those candidates whose spatial forms and temporal dynamics were plainly those of neurons. Due to the known 
modulatory effects of locomotion on mouse visual processing, we used for analysis only those experimental trials during which the mice had 
no locomotor activity. 


Replication We reproduced the main results of our study across 10 different mice under 2 different experimental conditions (5 mice in each group). 
Randomization We split the datasets randomly into training and test subsets (usually 50% each; see Methods for details). We determined measured 
quantities by averaging across multiple realizations of such a split. For analysis performed on subsets of neurons, we chose subsets randomly 


and averaged results across multiple subsets. See Methods and Extended Data Fig. 5b for details. 


Blinding All animals in the experiment belonged to the same experimental group, so blinding was neither needed nor feasible. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
|_| Eukaryotic cell lines [| Flow cytometry 
|_| Palaeontology [| MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Antibodies 


Antibodies used We immunostained tissue sections with antibodies against glial fibrillary activation protein (1:2500 dilution, rabbit anti-GFAP, 
Sigma HPAO56030, Lot C115616) and heat shock protein 70 (1:400 dilution, mouse anti-HSP, Enzo ADI-SPA-810, Clone C92F3A-5, 
Lot 01031912) and then applied fluorophore-conjugated secondary antibodies (goat anti-rabbit-Alexa 594 [Invitrogen, A-11012, 
Lot 1933366] and goat anti-mouse-Alexa 488 [Invitrogen, A-11001, Lot 56881A)). 


Validation We performed positive control experiments to validate the abilities of these antibodies to detect laser-induced tissue damage 
(Extended Data Fig. 2g). 
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Laboratory animals We analyzed data acquired from 6 male and 4 female Ai93 triple transgenic GCaMP6f-tTA-dCre mice from the Allen Institute 
(Rasgrf2-2A-dCre/CaMK2a-tTA/Ai93), which expressed the calcium indicator GCaMPé6f in layer 2/3 pyramidal cells. Mice were 
12-17 weeks of age when we implanted the cranial window in preparation for brain imaging. For illustrative purposes only, we 
imaged a single tetO-GCaMP6s/CaMK2a-tTA mouse42, which expressed the calcium indicator GCaMP6s in a subset of 
neocortical pyramidal neurons (Supplementary Video 3). 


Wild animals No wild animals were used. 
Field-collected samples There were no field-collected samples. 


Ethics oversight The Stanford University APLAC approved all procedures involving animals. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Radial glial progenitor cells (RGPs) are the major neural progenitor cells that generate 
neurons and gliain the developing mammalian cerebral cortex’ *. In RGPs, the 


centrosomeis positioned away from the nucleus at the apical surface of the ventricular 
zone of the cerebral cortex® *. However, the molecular basis and precise function of this 
distinctive subcellular organization of the centrosome are largely unknown. Here we 
showin mice that anchoring of the centrosome to the apical membrane controls the 
mechanical properties of cortical RGPs, and consequently their mitotic behaviour and 
the size and formation of the cortex. The mother centriole in RGPs develops distal 
appendages that anchor it to the apical membrane. Selective removal of centrosomal 
protein 83 (CEP83) eliminates these distal appendages and disrupts the anchorage of the 
centrosome to the apical membrane, resulting in the disorganization of microtubules 
and stretching and stiffening of the apical membrane. The elimination of CEP83 also 
activates the mechanically sensitive yes-associated protein (YAP) and promotes the 
excessive proliferation of RGPs, together with a subsequent overproduction of 
intermediate progenitor cells, which leads to the formation of an enlarged cortex with 
abnormal folding. Simultaneous elimination of YAP suppresses the cortical enlargement 
and folding that is induced by the removal of CEP83. Together, these results indicate a 
previously unknown role of the centrosome in regulating the mechanical features of 
neural progenitor cells and the size and configuration of the mammalian cerebral cortex. 


Anotable and unique feature of RGPs is their subcellular organization 
of the centrosome’, an organelle that functions as both the microtu- 
bule-organizing centre and the basal body for ciliogenesis in verte- 
brates? ”. Unlike typical mammalian cells, in which the centrosome is 
located next to the nucleus, in RGPs the centrosome is positioned away 
fromthe nucleus inthe apical endfoot at the surface of the ventricular 
zone® ®. Whereas the nucleus of an RGP exhibits interkinetic movement 
within the ventricular zone as it proceeds through the cell cycle, the 
centrosome remains located at the surface of the ventricular zone**. 
Moreover, individual centrosomes at the surface of the ventricular 
zone in interphase RGPs support the formation ofa primary cilium that 
projects into the lateral ventricle”. Although the centrosome has been 
shown to regulate the division of RGPs and cortical neurogenesis>"®, the 
molecular and cellular basis—and the precise function—of centrosome 
positioning at the surface of the ventricular zone remain largely unclear. 


Distal appendages anchor the centrosome 


As shown previously* 8, in RGPs fromthe mouse embryonic cortex the 
centrosome (labelled by an antibody against pericentrin; PCNT) was 


preferentially located at the surface of the ventricular zone and away 
fromthe nuclei (labelled by an antibody against PAX6, atranscription 
factor that is highly expressed in cortical RGPs)”° (Fig. 1a). To further 
assess the subcellular organization, we performed serial section trans- 
mission electron microscopy (ssTEM) (Fig. 1b). The mother centriole 
had prominent distal appendages (DAPs) and subdistal appendages 
(sSDAPs), whereas the daughter centriole lacked the appendages. Moreo- 
ver, the DAPs were in direct contact witha membrane pocket, indicating 
that the mother centriole is anchored to the apical membrane. In addi- 
tion, the mother centriole was positioned at the base ofa primary cilium 
arising from the membrane pocket—consistent with the function of the 
mother centriole as the basal body in primary ciliogenesis. Together, 
these results show that the centrosomes of interphase cortical RGPs 
are anchored to the apical membrane by DAPs that are preferentially 
assembled at the mother centriole. 


Cep83 deletion impairs centrosome anchoring 


To investigate the molecular control of centrosome anchorage to the 
apical membrane, we examined the expression of CEP83 (also known 
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Fig. 1| Deletion of Cep83 disrupts DAPs and the anchorage of the centrosome 
tothe membrane. a, Representative images of E15.5 cortex stained for PCNT 
(green) and PAX6 (red), and with DAPI (blue) (n=5). The arrow indicates the 
apical surface of the ventricular zone (VZ), where the centrosomes are located. 
The white bar indicates the top boundary of the ventricular zone. Scale bar, 
25m. b, Representative ssTEM images of E15.5 cortical ventricular zone 
surface. Red arrows indicate DAPs; yellow arrows indicate sDAPs. DC, daughter 
centriole; MC, mother centriole; PC, primary cilium. The white dashed boxes 
are shownat higher magnification onthe right. Scale bars, 8300 nm (left); 

200 nm (right). c, Representative images of E15.5 cortex stained for PCNT 
(green) and CEP83 (red), and with DAPI (blue) (n =3). d, Representative images 
of E15.5 wild-type (WT) and Cep83cKO ventricular zone surface stained for 
PCNT (green) and CEP83 (red), and with DAPI (blue). e, Representative images 
of E15.5 wild-type and Cep83cKO ventricular zone surface stained for PCNT 
(green) and CEP164 (red), and with DAPI (blue) (n = 3). Scale bars, 10 pm (main 
image); 11m (inset) (c—e). f, Representative ssT EM images of E15.5 Cep83cKO 
ventricular zone surface. Yellow arrows indicate sDAPs. The white dashed 
boxes are shownat higher magnification on the right. Scale bars, 800 nm (left); 
200nm (right). g, h, Quantification of the percentage of mother centrioles with 
DAPs (g) and with primary cilia and/or membrane anchorage (h).i, 
Quantification of the distance from mother centrioles to the apical membrane. 
Data are represented as a box plot, with median (centre line), interquartile 
range (box), and minimum and maximum values (whiskers) shown. Wild type, 
n=11centrosomes; Cep83cKO, n=48 centrosomes (g-i). A chi-square test (g, h) 
or two-sided Mann-Whitney Utest (i) was used for statistical analysis. 


as CCDC41)—a protein that has been shown to beat the root of the DAP 
assembly pathway in mammalian cell cultures”. CEP83 displayed a 
punctate pattern of expression and was localized to one end of the 
centrosome at the surface of the ventricular zone (Fig. 1c). These results 
suggest that CEP83 is expressed and localized at the centrosomes of 
cortical RGPs, and may have a role in the assembly of DAPs and the 
anchoring of the centrosome to the apical membrane. 

To test this, we engineered a conditional Cep83 mutant mouse allele, 
Cep83"", using a CRISPR-Cas9-mediated double-nicking strategy”? 
(Extended Data Fig. la, b). We then crossed the Cep83™ mouse with the 
Emx1™ mouse, in which Cre recombinase is selectively expressed in cor- 
tical RGPs, with strong activity by embryonic day (E) 10.5”*. Whereas in 
the E15.5 wild-type cortex CEP83 was abundantly expressed at RGP cen- 
trosomes at the surface of the ventricular zone, in the EmxI1“";Cep83"" 


conditional knockout (hereafter referred to as Cep83cKO) cortex 
CEP83 was depleted (Fig. 1d). The expression of CEP164, a character- 
istic marker of DAPs”, was also lost (Fig. le), suggesting a defect in the 
assembly of DAPs. 

We next analysed the Cep83cKO cortex using ssTEM (Fig. If). 
Although individual pairs of centrioles were observed with a similar 
frequency at the surface of the ventricular zone, the mother centri- 
oles had sDAPs but not DAPs (Fig. 1f, g). Moreover, the mother cen- 
trioles were not anchored to the apical membrane and no primary 
cilium was observed (Fig. 1f, h, Extended Data Fig. 1c). Consequently, 
the mother centriole and centrosome showed a small but significant 
(0.79 + 0.44 pm) dislocation away from the apical membrane (Fig. If, i). 
Together, these results demonstrate that removal of CEP83 in RGPs 
disrupts DAP assembly, and impairs the anchoring of the centrosome 
to the apical membrane as well as primary ciliogenesis. 


Cep83 deletion causes cortical defects 


Cep83cKO mice were born at the expected frequency and survived to 
adulthood. Notably, the brains of Cep83cKO mice were significantly 
larger than those of wild-type littermate control mice at postnatal day 
(P) 21 (Fig. 2a, b). Magnetic resonance imaging (MRI) analyses showed 
that the cortex was substantially enlarged, especially inthe mediodorsal 
region (Fig. 2c, d). 

The enlarged cortex indicates abnormalities in neuronal production. 
To examine this, we stained P21 brain sections with antibodies against 
CTIP2, a marker of layers V and VI neurons, and CUXI1, a marker of lay- 
ers II-IV neurons” (Fig. 2e). We observed a significant increase in the 
overall length, thickness and area of the Cep83cKO cortex compared 
with the wild-type cortex (Fig. 2f-h). In the Cep83cKO medial region 
that showed the largest increase in brain volume, the densities of both 
CTIP2* and CUXI' neurons were markedly increased compared with the 
wild type (Fig. 2i,j). We also observed consistent folding in this region 
of the Cep83cKO cortex, which was never seen in the wild-type cortex 
(Fig. 2e, i, Extended Data Fig. 1d, e). Inthe dorsolateral region, the den- 
sity of CUXI1* neurons was significantly higher in Cep83cKO than wild- 
type cortex, whereas the density of CTIP2* neurons was comparable 
(Fig. 2k, |). Similar results were obtained with antibodies against FOXP2, 
amarker of layer-VI neurons, and SATB2, a pan-neuronal marker that is 
enriched in superficial layers” (Extended Data Fig. 2a-d). The densities 
of glial cells did not show any obvious change (Extended Data Fig. 2e, f). 

Even though the densities of deep-layer neurons or glial cells inthe 
dorsolateral region did not significantly change, the increase in the 
total length, thickness and area of the Cep83cKO cortex indicated 
that the overall production of deep-layer neurons and glial cells was 
also substantially enhanced. No obvious hydrocephalus was observed 
(Fig. 2e). Together, these results suggest that the removal of CEP83 in 
RGPs leads to a loss of DAPs; the detaching of the centrosome from 
the apical membrane; and an enlarged cortex with excessive numbers 
of superficial-layer neurons, deep-layer neurons and glial cells and 
abnormal folding in the medial region. 

Previous studies suggest that primary cilia are crucial for the early 
patterning and polarity specification of the cortical primordium, 
but not essential to subsequent cortical neurogenesis and forma- 
tion’*?8?7°°_ To further assess the role of primary cilia in cortical RGPs, 
we crossed the conditional intraflagellar transport 88 (/ft88) mutant 
mouse, /ft88™, with the Emx1* mouse to selectively remove IFT88, a 
member of the IFT-B complex that is required for proper cilium forma- 
tionand function”. As expected, removal of IFT88 resulted ina loss of 
primary cilia in RGPs by E13.5 (Extended Data Fig. 3a, b). We observed 
no obvious defect in DAPs, sDAPs or anchoring of the mother centriole 
tothe membrane in RGPs that lack IFT88 (Extended Data Fig. 3b-d), nor 
any defect in cortical size or neuronal density (Extended Data Fig. 3e-k). 
These results provide further proof that loss of primary cilia in RGPs 
after around E11 does not alter cortical neurogenesis or formation. 
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Fig. 2| Detachment of the centrosome from the apical membrane leads to an 
enlarged cortex with abnormal folding. a, Representative whole-mount 
images of P21 wild-type and Cep83cKO brains. Scale bar, 0.5cm. 

b, Quantification of the projected cortical area (wild type, n=13 brains; 
Cep83cKO, n=11brains).c, MRI images of P21 wild-type and Cep83cKO brains 
along the rostrocaudal axis (numbers 1-4 represent the comparison at 4 
different positions along the rostrocaudal axis, inthe same brain dataset). 
Warmer colours indicate a larger difference between wild type and Cep83cKO. 
Scale bar, 1mm. d, Quantification of P21 wild-type and Cep83cKO cortical 
volume (n=7 brains (14 hemispheres) for each genotype). e, Representative 
images of P21 wild-type and Cep83cKO brain sections stained for CTIP2 (green) 
and CUX1 (red), and with DAPI (blue). Yellow dashed outlines delineate the total 
cortical area. Asterisks indicate the abnormal cortical folding in the medial 
region (lin wildtype, 1’ in Cep83cKO), whichis shownat higher magnificationin 
i. White dashed rectangles indicate a dorsal region (2 in wild type, 2’ in 
Cep83cKO), whichis shown at higher magnificationink. Scale bar, 1mm. 

f-h, Quantification of cortical length (f), area (g) and thickness (h) (wild type, 
n=8 brains (16 hemispheres); Cep83cKO, n=9 brains (18 hemispheres). Box 
plots as in Fig. 1.i,k, Representative images of the medial (i) or dorsal (k) region 
of P21 wild-type and Cep83cKO cortices stained for CTIP2 (green) and CUX1 
(red), and with DAPI (blue). Scale bars, 200 um (top); 100 pm (bottom). 

j, |, Quantification of the number of CUX1* (top) and CTIP2* (bottom) neurons 
per 250-um column in the medial (j) or dorsal (I) region (wild type, n=6 brains; 
Cep83cKO, n=S brains). The statistical significance of the difference between 
the wild-type and Cep83cKO brains in the total number of neurons (*P values), 
number of superficial-layer neurons (#P values) and number of deep-layer 
neurons ({P values) is shown. A two-sided Mann-Whitney Utest was used for 
statistical analysis. Bar charts show mean +s.e.m. 


A previous study suggested that increasing Sonic hedgehog (SHH) 
signalling in the developing cortex by the expression of a constitu- 
tively active form of Snoothened, SmoM2~—an activator of SHH signal- 
ling independent of ligand binding—enlarges the cortex and induces 
folding”. To further assess the role of SHH signalling in cortical RGPs, 
we crossed the SmoM2 transgenic mouse, R265"°"" (ref. *), with the 
EmxI“° mouse (Extended Data Fig. 4). Emx1°;R26°"°""" mutant (SmoM2) 
mice died at the neonatal stage with severe brain dysplasia. In addition 
to the loss of the olfactory bulb, the cortex was highly disorganized 
with no clear laminar organization. Together, these results suggest 
that increased SHH signalling in RGPs does not necessarily lead to an 
enlarged cortex with folding. 
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Cep83 deletion promotes RGP proliferation 


To pinpoint the origins of enhanced neurogenesis and cortical enlarge- 
ment in the Cep83cKO cortex, we examined the behaviour of RGPs at 
the embryonic stage. At E13.5, the Cep83cKO cortex was significantly 
larger than the wild-type cortex (Fig. 3a, b). The total length and area 
of the PAX6* domain in RGPs were greatly increased in the Cep83cKO 
cortex (Fig. 3c—e), even though the density of PAX6* RGPs did not signifi- 
cantly differ from that of the wild-type cortex (Extended Data Fig. Sa, b). 
These results suggest that the removal of CEP83 in RGPs leads toa dras- 
tic increase in the total number of RGPs, and consequently a lateral 
expansion of the developing cortex. 

RGPs divide at the surface of the ventricular zone to produce neurons 
or intermediate progenitor cells’’3+*>. We thus stained brain sections 
with an antibody against TBR2, a T-box transcription factor that is highly 
expressed inintermediate progenitor cells”, and found that the density 
of TBR2*intermediate progenitor cells in the Cep83cKO cortex was 
comparable to that in the wild-type cortex (Extended Data Fig. 5a, c). 
This indicates that removal of CEP83 in RGPs does not lead to an addi- 
tional increase in the production of intermediate progenitor cells at 
E13.5, even though the overall generation of intermediate progenitor 
cells would be enhanced owing to the increase in the number of RGPs. 

The marked increase in RGPs after removal of CEP83 probably 
arises from enhanced proliferation of RGPs. To test this, we performed 
sequential pulse-chase experiments (Fig. 3f, Extended Data Fig. 5d-f). 
We administered a single dose of 5-ethynyl-2’-deoxyuridine (EdU; a 
modified nucleoside) at E12.5, followed by asingle dose of 5-bromo-2’- 
deoxyuridine (BrdU; a thymidine analogue), and collected the brains 
one hour later for analyses. We found that the percentage of EdU* RGPs 
inthe ventricular zone that also expressed BrdU (EdU*BrdU* RGPs) was 
substantially increased in the Cep83cKO cortex compared with the wild- 
type cortex (Fig. 3f, g, Extended Data Fig. 5d, e), suggesting that dividing 
RGPs in the Cep8&3cKO cortex re-enter the cell cycle faster than those 
inthe wild-type cortex. The acceleration of cell-cycle progression was 
corroborated by an increased density of BrdU* RGPs (Fig. 3f, h, Extended 
Data Fig. Sd, f). Collectively, these results suggest that removal of CEP83 
in RGPs accelerates re-entry into the cell cycle, which leads to an increase 
in RGP production and a lateral expansion of the developing cortex at 
the early embryonic stage of cortical development. 


Cep83 deletion enhances radial neuronal production 


To further dissect the cellular basis of the abnormal development of the 
Cep83cKO cortex, we examined the behaviour of cortical progenitor 
cells at E15.5 (Fig. 3i-o). The Cep83cKO cortex remained significantly 
larger than the wild-type cortex (Fig. 3i,j). The length and area of the 
PAX6* domain in RGPs were significantly increased in the Cep83cKO 
cortex (Fig. 3k, |). Although the density of PAX6* RGPs in the ventricular 
zone was comparable (Fig. 3m, n), the density of TBR2* intermediate 
progenitor cells inthe subventricular zone was significantly increased 
inthe Cep83cKO cortex (Fig. 3m, 0). These results suggest that removal 
of CEP83 in RGPs leads to a subsequent increase in the production 
of intermediate progenitor cells at the late embryonic stage of corti- 
cal development. Consistently, we observed a substantial increase in 
mitotic cells labelled by phosphorylated histone H3 (P-HH3) in the 
subventricular zone, but not at the ventricular zone surface (Extended 
Data Fig. 6a, e, f). The P-HH3* cells inthe subventricular zone of the Cep- 
83cKO cortex were predominantly intermediate progenitor cells, and 
not outer subventricular zone RGPs (also called basal or intermediate 
RGPs)** *! (Extended Data Fig. 6b, c, d, gl). Notably, the densities of 
PAX6* RGPs in the ventricular zone and TBR2°* intermediate progeni- 
tor cells in the subventricular zone were significantly increased in the 
dorsomedial region (Extended Data Fig. 5g-j), where folding repeat- 
edly occurred—consistent with the more-drastic increase in neuronal 
densities in this region (Fig. 2i,j). 
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Fig. 3| Detachment of the centrosome from the apical membrane leads to 
excessive proliferation of RGPs and an additional increase in the production 
of intermediate progenitor cells. a, Representative whole-mount images of 
E13.5 wild-type and Cep83cKO brains. Scale bar, 1mm. b, Quantification of the 
projected cortical area (wild type, n=6 brains; Cep83cKO, n=5S brains). 

c, Representative images of E13.5 wild-type and Cep83cKO cortices stained for 
PAX6 (green), and with DAPI (blue). The arrowheads indicate the boundaries of 
the PAX6* domain. Scale bar, 0.5 mm.d,e, Quantification of the length (d) and 
area (e) of the PAX6* domain (wild type, n=5 brains;cKO, n= 6 brains). 

f, Representative images of E12.5 wild-type and Cep83cKO cortices 
(dorsolateral region) that were subjected to EdU (red) and BrdU (green) 
sequential pulse-chase labelling (top schematic). Cortices were stained for 
PAX6 (grey), and with DAPI (blue). Scale bars, 50 tm (top); 25 um (bottom). 

g,h, Quantification of the percentage of EdU*BrdU* cells among the total EdU* 
cells in the ventricular zone (g), and the number of BrdU’ cells in the ventricular 


To test whether other components of the DAP assembly process regu- 
late the behaviour of RGPs and cortical development, we engineered 
short hairpin RNAs (shRNAs) against Cep89 and Sclt1—two parallel 
components of DAP assembly downstream of CEP83 (Extended Data 
Fig. 7a—d). Suppression of the expression of CEP89 and SCLT1 led to 
a significant increase (relative to controls that were not treated with 
shRNAs or treated with non-effective shRNAs) in both PAX6* RGPs 
and TBR2* intermediate progenitor cells (Extended Data Fig. 7e-h), 
suggesting that removal of other DAP assembly components—similar 


zone per 250-tm column (h) (wild type, n=8 brains; Cep83cKO, n=8 brains). 

i, Representative whole-mount images of E15.5 wild-type and Cep83cKO brains. 
Scale bar, 1mm.j, Quantification of the projected cortical area (wild type, n=32 
brains; Cep83cKO, n=9 brains). k, Representative images of E15.5 wild-type and 
Cep83cKO brain sections stained for PAX6 (green), and with DAPI (blue). 
Arrowheads indicate the boundaries of the PAX6* domain. Scale bar, 0.5mm. 

1, Quantification of the length of the PAX6* domain (wild type, n=8 brains; 
Cep83cKO, n=7 brains).m, Representative images of E15.5 wild-type and 
Cep83cKO cortices (dorsolateral region). Cortices were stained for PAX6 
(green) and TRB2 (red), and with DAPI (blue). CP, cortical plate; IZ, intermediate 
zone; SVZ, subventricular zone. Scale bar, 50 pm.n, o, Quantification of the 
number of PAX6’ (n) and TBR2’ (0) cells per 250-m column (wild type,n=8 
brains; Cep83cKO, n=6 brains). A two-sided Mann-Whitney Utest was used for 
statistical analysis. Box plots as in Fig. 1. 


to removal of CEP83—leads to the excessive production of RGPs and 
intermediate progenitor cells in the developing cortex. 


Disruption of apical microtubule organization 

To further reveal the underlying mechanisms of CEP83 function, we 
examined the properties of the apical membrane (that is, the surface 
of the ventricular zone) to which the centrosome is normally anchored. 
We prepared whole-mount cortical slabs at E15.5, stained these with 
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Fig. 4 | Centrosome detachment disrupts microtubule organization and 
results in stretching and stiffening of the apical membrane. 

a, Representative en face images of E15.5 wild-type and Cep83cKO ventricular 
zone surface stained for a-tubulin (green), actin (red) and PCNT (blue). Scale 
bars, 5 um (left), 1m (right, top) and 2 pm (right, bottom). b-e, Quantification 
of the intensity of a-tubulin (microtubules) (b) or PCNT (c) per apical domain; 
the perimeter of individual junctions (d); and the intensity of actin per unit of 
junction length (e) (wild type, n= 478 (b), 361 (c) and 200 (d, e) apical domains 
from 4 embryos; Cep83cKO, n= 443 (b), 258 (c) and 200 (d, e) apical domains 
from 4 embryos). AU, arbitrary units. f, Schematic diagram showing the use of 
AFM toanalyse the stiffness of the apical membrane. The indentation of the 
cantilever probe generates force-distance curves, including the approach 
curve (red) and the retraction curve (blue). d, indentation depth. 

g, Representative heat maps of Young’s modulus, reflecting the stiffness of 
E15.5 wild-type and Cep83cKO ventricular zone surface. h, Quantification of 
the Young’s modulus of wild-type and Cep83cKO ventricular zone surface (wild 
type, n=10 sample areas; Cep83cKO, n=9 sample areas; from 3 brains for each 
genotype). A two-sided Mann-Whitney Utest was used for statistical analysis. 
Box plots asin Fig. 1. 


antibodies against PCNT, a-tubulin and actin and acquired en face 
images of the apical membrane (Fig. 4a). In the wild-type cortex, actin 
marked cell junctions that were formed between the apical endfeet of 
neighbouring RGPs, and a prominent centrosome revealed by PCNT 
staining was commonly found within individual apical endfeet (Fig. 4a, 
top). Notably, microtubules (labelled by a-tubulin staining) often 
formed aring-like structure in juxtaposition with actin-labelled junc- 
tions (Fig. 4a, top insets). Notably, in the Cep83cKO cortex, although 
the junctions and the positioning of the centrosome inside the apical 
endfeet remained largely intact, the ring-like microtubule structure 
disappeared (Fig. 4a, bottom insets). Fibrous microtubules were con- 
sistently observed. The intensity of microtubules in individual apical 
domains was significantly reduced (Fig. 4b), whereas the intensity of 
PCNT was similar (Fig. 4c). The normal expression of PCNT and the 
existence of fibrous microtubules indicate that microtubule formation 
is not systematically compromised in the absence of CEP83. Consist- 
ent with this, the non-apical membrane microtubules—as well as the 
microtubules in mitotic RGPs—did not exhibit any obvious difference 
in the Cep83cKO cortex compared to the wild type (Extended Data 
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Fig. 8a, b). No obvious change was observed in RGP polarity, expres- 
sion of junction proteins or radial scaffolding in the Cep83cKO cortex 
(Extended Data Fig. 8c-i). Together, these results suggest that centro- 
some detachment impairs the organization of microtubules specifically 
at the apical membrane. 


Alteration of apical membrane properties 


The junction size that corresponds to the apical membrane of indi- 
vidual RGPs at the surface of the ventricular zone appeared enlarged 
in the Cep83cKO cortex (Fig. 4a). To confirm this, we systematically 
examined junction size at E13.5 and E15.5 by staining for actin, and 
found that the junction size of both interphase and mitotic RGPs was 
significantly larger in the Cep83cKO cortex than the wild-type cortex 
(Fig. 4d, Extended Data Fig. 9a—c), suggesting that the apical membrane 
of RGPs and the junction between RGPs are stretched and enlarged in 
the Cep83cKO cortex. Coinciding with this, the intensity of actin per 
unit of junction length was significantly reduced (Fig. 4e). 

The stretching and enlargement of the apical membrane and junc- 
tions of RGPs suggest that the mechanical properties of the surface 
of the ventricular zone might be altered. To directly test this, we used 
atomic force microscopy (AFM), which allows a quantitative examina- 
tion of the stiffness of live tissue (Fig. 4f). We prepared acute whole- 
mount wild-type and Cep83cKO cortical slabs and performed AFM 
analysis. The Young’s modulus (also known as the elastic modulus) of 
the apical membrane was significantly higher in the Cep83cKO cortex 
than the wild-type cortex (Fig. 4g, h). These results suggest that the 
detachment of the centrosome increases the stiffness of the apical 
membrane, where RGP division selectively occurs. 


Cortical defects depend on YAP 


Cell stretching and increased tissue rigidity activates YAP, a crucial tran- 
scriptional co-activator in the HIPPO signalling pathway that regulates 
cell proliferation and organ size”. In line with this, we found that the 
expression of YAP inthe ventricular zone was significantly higher in the 
Cep83cKO cortex than the wild-type cortex (Fig. 5a, b). Moreover, inthe 
Cep83cKO cortex, significantly more PAX6* RGPs showed a strong YAP 
signal in the nucleus (Fig. 5a, c). By contrast, YAP expression in the wild- 
type RGP nucleus was generally low. Nuclear expression of YAP in TBR2* 
intermediate progenitor cells in the wild-type or Cep83cKO cortex was 
also low (Extended Data Fig. 9d). In addition, we observed no obvi- 
ous difference in YAP expression between dissociated wild-type and 
Cep83cKO RGPs in culture, with no junction formation (Extended Data 
Fig. 9e, f). Together, these results suggest that the detachment of the 
centrosome and the stretching and stiffening of the apical membrane 
cause YAP overexpression and nuclear localization selectively in RGPs. 

We next asked whether the enlargement and folding of the Cep83cKO 
cortex depends onan increase in the expression and activation of YAP. 
We crossed the conditional Yap mutant mouse (Yap™) with the EmxI“* 
or EmxI‘;Cep83 mouse, to generate mice with individual or simul- 
taneous deletions of Cep83 and Yap (also known as Yap1) in cortical 
RGPs. Compared with wild-type littermate controls, Emx1°?;Yap™ 
(YapcKO) mice did not show any obvious change in the size or neu- 
ronal density of the cortex (Fig. Sd-g, Extended Data Fig. 10a, b). This 
is consistent with a relatively low level of YAP expression in RGP nuclei 
under normal conditions (Fig. 5a). Although the Cep83cKO cortex was 
significantly enlarged, with abnormal folding and increased neuronal 
density, the Emx1“?;Cep83™: Yap™ (Cep83 Yap conditional double 
knockout, cDKO) cortex was comparable to the wild type (Fig. Sd-g, 
Extended Data Fig. 10a, b), indicating that the simultaneous deletion 
of Yap effectively suppresses the increase in cortical neurogenesis 
that is triggered by the deletion of Cep83. We did not observe any fold- 
ing in the Cep83YapcDKO cortex (Fig. 5f). Consistent with the notion 
that increased YAP expression and activation is downstream of apical 
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Fig. 5 | Excessive activation of YAP is essential for the enlargement and 
abnormal folding of the Cep83cKO cortex. a, Representative images of 

E15.5 wild-type and Cep83cKO cortices stained for YAP (green) and PAX6 

(red), and with DAPI (blue). Scale bars, 50 um (main image); 5 um (inset). 

b,c, Quantification of YAP intensity (b) and the percentage of PAX6’ cells with 
nuclear YAP (c) inthe ventricular zone (n= 6 brains for each genotype). 

d, Representative whole-mount images of P21 wild-type, YapcKO, Cep83cKO 
and Cep83YapcDKO brains. Scale bar, 0.5 cm. e, Quantification of the projected 
cortical area (wild type, n=8 brains; YapcKO, n=6 brains; Cep83cKO, n=7 
brains; Cep83YapcDKO, n=12 brains). f, Representative images of P21 wild-type 


membrane stretching and stiffening, the apical domain remained sig- 
nificantly larger in the Cep83YapcDKO cortex than in the wild-type 
cortex (Extended Data Fig. 10c-f). Together, these results demonstrate 
that the cortical enlargement and folding caused by the removal of 
CEP83 in RGPs depend on the overactivation of YAP, which is caused 
by apical membrane stretching and stiffening. 


Discussion 

The centrosome in RGPs is anchored to the apical membrane by the 
DAPs. Removal of CEP83 —a DAP protein—in RGPs disrupts the for- 
mation of DAPs and causes the detachment of the centrosome from 
the apical membrane. This subtle (less than 1 um) dislocation of the 
centrosome causes substantial changes in the behaviour of RGPs in 
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and Cep83cKO brain sections stained for CTIP2 (green) and CUX1 (red), and 
with DAPI (blue). Yellow dashed outlines delineate the total cortical area. 
Asterisk indicates the abnormal cortical folding in the medial region. 
Scale bar, 1mm. g, Quantification of P21 wild-type, YapcKO, Cep83cKO and 
Cep83YapcDKO cortical length (wild type, n=4 brains; YapcKO, n=6 brains; 
Cep83cKO, n=6 brains; Cep83YapcDKO, n= 8 brains). h, Model indicating the 
positioning and function of the centrosome in cortical RGPs. MT, microtubule; 
IPs, intermediate progenitor cells. A two-sided Mann-Whitney Utest was used 
for statistical analysis. Box plots asin Fig. 1. 


the developing cortex. Our side-by-side comparisons of /ft88cKO and 
SmoM2 brains with Cep83cKO brains suggest that the drastic enlarge- 
ment and abnormal folding (albeit with normal lamination and lat- 
eral ventricle size) of the Cep83cKO cortex is unlikely to be a result 
of primary cilium loss or increased SHH signalling. Instead, we have 
uncovered a previously unrecognized function of the centrosome in 
RGPs (Fig. 5h). The docking of the centrosome to the apical membrane 
supports the formation of prominent ring-like microtubule structures 
that are juxtaposed to the cell junctions, and this is likely to promote 
intracellular cortical contractile force in conjunction with the actin 
network. The contractile force of individual RGPs is balanced by the 
intercellular tugging force exerted between neighbouring RGPs that 
form junctions with each other, which would determine the overall 
stiffness or rigidity of the surface of the ventricular zone. Primary 
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ciliogenesis may further strengthen the tethering of the centrosome 
and influence the organization of microtubules and the properties of 
the apical membrane. A similar microtubule ring and intricate organi- 
zation of the centrosome has been observed in neuroepithelial cells 
from the spinal cord of chickens“, suggesting that this isa common 
feature of neural progenitor cells. 

Although an increase in neuronal density was observed throughout 
the cortical area in the Cep83cKO brain, the folding occurred predomi- 
nantly in the medial region, in which the density of both deep-layer and 
superficial-layer neurons was markedly increased. These observations 
are consistent with the notion that a substantial radial expansion in 
neurogenesis is crucial for the folding of the cortex**. The local anatomi- 
cal organization might also render the medial region more susceptible 
to folding. Our data suggest that the subcellular organization of the 
centrosome and the mechanical properties of neural progenitor cells 
affect their proliferative and neurogenic capacity. Notably, as devel- 
opment proceeds, the stiffness of the surface of the ventricular zone 
in the mouse cortex gradually decreases**. In addition, the surface of 
the ventricular zone appears to be stiffer in ferrets (which develop a 
large and gyrated cortex) than in mice**. These observations point to 
arelationship between the mechanical properties of RGPs and the size 
and formation of the cortex. The enlarged cortex with excessive neuro- 
genesis that we observed in the absence of CEP83 reveals a link between 
centrosomal abnormalities and brain overgrowth (that is, megalen- 
cephaly. Biallelic mutations in human CEP83 have been found to cause 
infantile nephronophthisis and intellectual disability’, underscoring 
the importance of CEP83 and centrosome positioning in controlling 
the development and function of the human brain. 
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Methods 


Mouse lines 

The Cep83 conditional knockout mice were generated using a CRISPR- 
Cas9-mediated double-nicking strategy”?. Guide RNAs (gRNAs) were 
designed and synthesized according to described methods”. A pair of 
gRNAs, In3A (S’-GGTTTCCTGACAACGCAGAT-3’) and /n3B (5’-TCAGTC- 
CAGTTCAGTAGCGT-3’), was selected for the high targeting efficiency of 
these gRNAs based ona Surveyor assay (Integrated DNA Technologies) 
and cloned into a pX335 vector. To generate a minivector gene-targeting 
construct, a DNA fragment of mouse Cep83 containing the critical exon 
3 was amplified from BAC clone RP23-422L20 (Children’s Hospital 
Oakland Research Institute) and cloned into a pL 451 vector using the 
Golden Gate Assembly method. A mixture of pX335-In3A, pX335-In3B 
and pL451-Cep83 flox-neo plasmids were then electroporated into a 
W4 mouse embryonic stem (ES) cell line on a129S6 background for 
gene targeting (Rockefeller University Gene Targeting Resource 
Centre). Correctly targeted ES cell clones were screened by Southern 
blot against the 5’-homology arm, and confirmed by long-range PCR, 
genotyping and sequencing. ES cell clones were microinjected into 
C57B6/J blastocysts for chimaera production. Male chimaeras were 
crossed to multiple C57B6/J females to screen and obtain Cep83""? 
mice through genotyping. Actin-Flp transgenic mice (005703; The 
Jackson Laboratory) were used to excise the Neo selection cassette 
and obtain Cep83™. Genotyping primers for the Cep83 floxed allele 
at the 5’ loxP site were: forward, S’-AGTGGGCTGTGAATGTAGTCTT-3’; 
reverse, 5’-AGCCAACCAATAATACAGAAAACA-3’. Deletion of exon 3 
creates a frameshift in subsequent exons and thereby interferes with 
the expression of the CEP83 protein. /ft88™ (ref. *"), Yap™ (ref. #8) and 
R26-LSL-SmoM2 (ref. *) mice were provided by B. Yoder, J. Wrana and 
A. McMahon, respectively. Emx1“ (005628; The Jackson Laboratory) 
was used to delete genes in the cortex. Genotyping was carried out 
using standard PCR protocols. Both male and female mice were used 
in the study. The mice were maintained at the facilities of Memorial 
Sloan Kettering Cancer Centre and Tsinghua University, and all animal 
procedures were approved by the Institutional Animal Care and Use 
Committees. For timed pregnancies, the plug date was designated 
as EO and the date of birth was defined as PO. No wild animal or field- 
collected sample was used in the study. 


shRNA design and in utero electroporation 

Three shRNA sequences against Cep83, Cep89 and Scit1 were designed 
as follows: Cep83 shRNA-a (5’-GCAAGCAAGCCAGGAAAAA-3’), Cep8&3 
shRNA-b (5’-GCTCCAATGCGAGAACGTT-3’), Cep83 shRNA-c (5’-GCTA 
GAACTTGAGAACAGA-3’); Cep89 shRNA-a (5’-GGACGTCATTACCA 
TCCT-3’), Cep89-b (5’-GGGCCCCACACCACCCTGG-3’), Cep89-c 
(5’-GTCGTGAAGGAAAACGAAGCC-3’); Scit1 ShRNA-a (5’-GATAAACT 
AAATGATATT-3’), ScltI1-b (5’-AAATGCATCAAAGATGTC-3’), ScltI-c 
(5’-GGCAAACAGGATGAAAGTGA-3’). All sense and anti-sense oligos were 
purchased from IDT. Annealed oligos were cloned into the Hpal and Xhol 
sites of the lentiviral vector pLL3.7. In utero electroporation was per- 
formedas previously described’. In brief, atimed pregnant CD-1 mouse at 
E13.5 was anaesthetized, the uterine horns were exposed and around 1 pl 
plasmid DNA mixed with Fast Green (Sigma) was microinjected through 
the uterus into the lateral ventricle manually using a bevelled and cali- 
brated glass micropipette (Drummond Scientific). For electroporation, 
five 50-ms pulses of around 35 mV witha 950-ms interval were delivered 
across the uterus with two 5-mm electrode paddles positioned on either 
side of the head (BTX, ECM830). During the procedure, the embryos were 
constantly bathed with warm sterile PBS (pH 7.4). After electroporation, 
the uterus was placed back in the abdominal cavity and the wound was 
surgically sutured. After surgery, the mouse was placed ina 28 °C recov- 
ery incubator with proper analgesic treatments until it recovered and 
resumed normal activity. All procedures for animal handling and usage 
were approved by the institutional research animal resource centre. 


Brain sectioning, immunohistochemistry and imaging 

Timed pregnant females that carried conditional mutant alleles were 
anaesthetized and embryos were removed and perfused with ice-cold 
phosphate buffered saline (PBS, pH 7.4), followed by 4% paraform- 
aldehyde (PFA). Brains were post-fixed with 4% PFA for around 6h, 
cryo-protected and sectioned at 12 um for immunohistochemistry as 
previously described’®. Postnatal mice were similarly processed and 
cryo-sectioned at 40 pm. For en face analysis of the ventricular surface, 
embryos were perfused with warm PBS and PFA to avoid microtubule 
depolymerization. The dorsal telencephalon was then dissected out 
of the embryonic brain to expose the ventricular surface for immu- 
nohistochemistry. The following primary antibodies were used: 
Alexa Fluor 546 phalloidin (A22283; RRID: AB_2632953; lot 1947552; 
1:500, Thermo Fisher Scientific), goat anti-FOXP2 (AB16046; RRID: 
AB_2107107; lot GR3237165-1; 1:100, Abcam), goat anti-SOX2 (SC-17320; 
RRID: AB 2286684; lot EO715; 1:500, Santa Cruz), chicken anti-GFP 
(GFP-1020; RRID: AB 10000240; lot GFP879484; 1:500, AVES), mouse 
anti-B-catenin (610153; RRID: AB_397554; lot 7187864; 1:500, BD Biosci- 
ence), mouse anti-S100a/ (SC-58839; RRID: AB_2183338; lot K1215; 
1:200, Santa Cruz), mouse anti-phospho-vimentin (AB22651; RRID: 
AB 447222; lot GR3233697-1; 1:500, Abcam), mouse anti-N-cadherin 
(AB98952; RRID: AB_ 10696943; lot GR287147-10; 1:500, Abcam), mouse 
anti-nestin (RAT-401; RRID: AB_2235915; lot 5/26/2016; 1:500, Develop- 
mental Studies Hybridoma Bank), mouse anti-neurofilament (837904; 
RRID: AB_2566782; lot B263754; 1:500, BioLegend), mouse anti-PCNT 
(611814; RRID: AB_ 399294; lot 8163868; 1:200, BD Biosciences), mouse 
anti-a-tubulin (T9026; RRID: AB_477593; lot 047M4789V; 1:1,000, 
Sigma-Aldrich), mouse anti-YAP (SC-101199; RRID: AB_1131430; lot 
F2916; 1:200, Santa Cruz), mouse anti-ZO-1 (33-9100; RRID: AB _ 87181; 
lot TH275232; 1:200, Thermo Fisher Scientific), rabbit anti-ARL13B” 
(1:500), rabbit anti-BLBP (AB32423; RRID: AB 880078; lot GR260227- 
2;1:500, Abcam), rabbit anti-CEP83 (HPAO38161; RRID: AB 10674547; 
lot A91789; 1:200, Sigma-Aldrich), rabbit anti-CEP89 (AB204410; 
validated by western blot and immunostaining; lot GR3247629-1; 
1:500, Abcam), rabbit anti-ODF2 (12058-1-AP; RRID: AB_2156630; lot 
00050046; 1:500, Proteintech), rabbit anti-CEP164 (HPA037606; RRID: 
AB_10672498; lot A95909; 1:200, Sigma-Aldrich), rabbit anti-CUX1 
(SC-13024; RRID: AB_ 2261231; lot H2815; 1:200, Santa Cruz), rabbit 
anti- HOPX (HPAO30180; RRID: AB_10603770; lot C105589; 1:1,000, 
Sigma-Aldrich), rabbit anti-MAP2 (AB5622; RRID: AB_11213363; lot 
3053795; 1:500, EMD Millipore), rabbit anti-PARD3 (HPA030443; RRID: 
AB_ 10600926; lot C105765; 1:500, Sigma-Aldrich), rabbit anti-PAX6 
(901301; RRID: AB_256003; lot B267205; 1:500, Biolegend), rabbit anti- 
PAX6 (PDO22; RRID: AB 1520876; lot 005; 1:500, MBL), rabbit anti- 
PCNT (AB4448; RRID: AB 304461; lot GR3200989-1; 1:500, Abcam), 
rabbit anti-OLIG2 (AB9610; RRID: AB_570666; lot 2950732; 1:500, 
EMD Millipore), rabbit anti-P-HH3 (AB47297; RRID: AB_880448; lot 
GR3190286-11; 1:1,000, Abcam), rabbit anti-PTPRZ1 (HPAO15103; RRID: 
AB_1855946; lot B105439; 1:500, Sigma-Aldrich), rabbit anti-SATB2 
(AB92446; RRID: AB 10563678; lot GR285095-11; 1:500, Abcam), rab- 
bit anti-TNC (AB108930; RRID: AB 10865908; lot GR308354-7; 1:500, 
Abcam), rat anti-BrdU (AB6326; RRID: AB_ 305426; lot GR191332-1;1:500, 
Abcam), rat anti-CTIP2 (18465; RRID: AB 2064130; lot GR203038-2; 
1:1,000, Abcam), rat anti-TBR2 (12-4875-82; RRID: AB_1603275; lot 
4279686; 1:100, eBioscience). Alexa Fluor 488 donkey anti-rabbit IgG 
(A-21206; RRID: AB_141708; lot 1910751; 1:1,000, Thermo Fisher Scien- 
tific), donkey anti-mouse IgG (A-21202; RRID: AB_141607; lot 1890861; 
1:1,000, Thermo Fisher Scientific), donkey anti-goat IgG (A-11055; RRID: 
AB 2534102; lot 1627966; 1:1,000, Thermo Fisher Scientific), goat anti- 
rat IgG (A-11006; RRID: AB_141373; lot 1887148; 1:1,000, Thermo Fisher 
Scientific), donkey anti-chicken IgY (703-546-155; RRID: AB_2340376; 
lot 132803; 1:1,000, Jackson ImmunoResearch), Alexa Fluor 555 don- 
key anti-rabbit IgG (A-21432; RRID: AB_141788; lot 1866859; 1:1,000, 
Thermo Fisher Scientific), donkey anti-mouse IgG (A-31570; RRID: 
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AB _ 2536180; lot 1850121; 1:1,000, Thermo Fisher Scientific), donkey 
anti-goat IgG (A-21432; RRID: AB_141788; lot 2026158; 1:1000, Thermo 
Fisher Scientific), Alexa Fluor 594 donkey anti-rat IgG (A-21209; RRID: 
AB 2535795; lot 1905801; 1:1,000, Thermo Fisher Scientific), Alexa Fluor 
647 donkey anti-rabbit IgG (A-31573; RRID: AB_2536183; lot 1903516; 
1:1,000, Thermo Fisher Scientific), Alexa Fluor 647 donkey anti-mouse 
IgG (A-31571; RRID: AB_162542; lot 1839633; 1:1,000, Thermo Fisher Sci- 
entific), donkey anti-goat IgG (A-21447; RRID: AB_141844; lot 1627966; 
1:1,000, Thermo Fisher Scientific), goat anti-rat IgG (A-21247; RRID: 
AB 141778; lot 1858181; 1:1,000, Thermo Fisher Scientific) conjugated 
secondary antibodies were used. For EdU and BrdU double pulse-chase 
analysis, mice were weighted and injected with EdU (10 pg per gram 
body weight) and BrdU (50 pg per gram body weight) sequentially. EdU 
staining was performed using Click-IT EdU Alexa Fluor 647 Imaging Kit 
(Thermo Fisher Scientific). Before proceeding with BrdU staining, tissue 
sections were blocked with azidomethylphenylsulfide to minimize the 
cross-reactivity of anti-BrdU antibody against EdU™. BrdU staining was 
performed as described previously'®. Coronal sections were imaged 
with a FV1200 or FV3000 confocal microscope (Olympus) and Nano- 
Zoomer 2.0 HT slide scanner (Hamamatsu Photonics). Free-floating 
dorsal telencephalon was submerged in PBS and positioned in an en face 
view, and imaged with a FV1200 or FV3000 confocal microscope with 
water-immersion objectives. Cortical length and area were estimated 
by measuring the overall length and area of the dorsal cortex in indi- 
vidual brain sections at a similar rostrocaudal position. The densities of 
neurons were quantified by measuring the number of cells positive for 
specific markers ina 250-ym-width rectangular region perpendicular 
to the pia covering the entire cortex. En face images of the ventricu- 
lar surface were automatically segmented with the Fiji plug-in Tissue 
Analyzer” and manually corrected. Cell boundaries at the edges of 
images were manually removed and thereby excluded from analysis. 
The segmented images were then transformed into labelled images with 
the Fiji plug-in MorphoLibJ*”. Subsequently, apical domain sizes were 
measured through the particle analysis function of Fiji. For visualiza- 
tion of the results, apical domain size was colour-coded with MatLab 
(v.R2016b, Mathworks). Allimages were analysed and processed using 
Fluoview (v.4.2, Olympus), Volocity (v.6.3, Perkin Elmer), Image) (Fiji) 
(1.52p, NIH), NDP viewer (v.2, Hamamatsu Photonics), Imaris (v.9.0.1, 
Oxford Instruments) and Photoshop (Adobe). 


ssTEM 

For TEM analysis, timed pregnant females were prepared and embryos 
were removed and perfused with 0.1M sodium cacodylate buffer (pH 
7.4) and a fixative containing 2% PFA and 2.5% glutaraldehyde at room 
temperature, followed by post-fixation overnight with the same fixa- 
tive at 4 °C. Brains were then sliced into 1-mm thick coronal sections 
with a mouse brain mould. The selected slices were re-fixed with 2.5% 
glutaraldehyde and 0.1% tannic acid for one hour and then with 2.5% 
glutaraldehyde overnight. The slices were post-fixed with 1% osmium 
tetra-oxide and 0.4% potassium ferrocyanide for 1h, followed by en bloc 
staining with 1% uranyl acetate for 30 min. Sections were subsequently 
dehydrated witha graded ethanol series, infiltrated and embedded with 
Eponatel12 resin (Electron Microscopy Sciences). Serial sections (70 nm) 
of brain regions close to the ventricular surface were cut by an ultra- 
microtome (Ultracut E; Leica). Serial images of centrioles from RGPs at 
the ventricular surface were acquired with aJOEL 100CX transmission 
electron microscope with a digital imaging system (XR41-C, Advantage 
Microscopy Technology) at 80 kV at the Rockefeller University Electron 
Microscopy Resource Centre. 


MRI analysis 

Ex vivo MRI of 4% PFA-fixed mouse brain specimens was performed 
ona horizontal 7 Tesla MR scanner (Bruker Biospin) with a triple-axis 
gradient system. Images were acquired using a quadrature volume 
excitation coil (72-mm diameter) and a receive-only 4-channel phased 


array cryogenic coil. The specimens were imaged with the skull intact 
and placed ina syringe filled with Fomblin to prevent tissue dehydra- 
tion. For MRI-based characterization of macroscopic brain morphol- 
ogy, diffusion MRI data were acquired instead of conventional T, or 
T,-weighted MRI, from P21 mouse brains that were not yet fully myeli- 
nated®. High-resolution diffusion MRI data were acquired using a modi- 
fied three-dimensional (3D) diffusion-weighted gradient and spin echo 
(DW-GRASE) sequence™ with the following parameters: echo time (TE)/ 
repetition time (TR) = 30/500 ms; two signal averages; field of view 
(FOV) =12.8 mm x10 mm x 18 mm, resolution = 0.1mm x 0.1mm x 0.1 
mm; two non-diffusion weighted image (b,); 30 diffusion directions; 
and b=2,000 s mm*. The total imaging time was approximately 6h 
for each specimen. 

From the diffusion MRI data, diffusion tensors were calculated using 
the log-linear fitting method implemented in DTIStudio (v.2 10 6 (http:// 
www. mristudio.org) at each pixel. The mouse brain images were rigidly 
aligned to an ex vivo mouse brain template in our MRI-based mouse 
brain atlas® using the large deformation diffeomorphic metric mapping 
(LDDMM) method** implemented in the DiffeoMap software (v.2 10 6) 
(http://www. mristudio.org). To further determine the specific cortical 
regions inthe knockout mice that showed significant changes in local 
tissue volume with respect to the control mice, voxel-based morpho- 
metric analysis was also performed as described previously” with the 
false discovery rate (FDR) set at 0.05. Cortical volume was estimated 
onthe basis of the MRI data. 


AFM 

To prepare samples for AFM, the dorsal telencephalon was dissected 
from the embryonic brain in 1x DMEM-N2 medium (Thermo Fisher 
Scientific) to expose the ventricular surface. Tissues were positioned 
with the ventricular zone surface upward and mounted onto 50-mm 
glass-bottom Fluorodish cell-culture dishes (World Precision Instru- 
ments) coated with Cell Tak tissue adhesive (Corning). Tissues were 
then covered with 1x DMEM-N2 medium and recovered in a5% CO, 
chamber at 37 °C for 1h. Stiffness measurement was performed by 
MFP-3D-BIO AFM (Asylum Research). An Axio Observer Z1 inverted 
microscope (Zeiss) served as the AFM base (LD Plan-Neofluar 5x 0.15 
NA objective) to locate the sample and position the cantilever tip over 
thesample. A CP-CONT-BSG-C (sQube) probe witha 20-ym borosilicate 
glass bead was used for all measurements. The Asylum Research Get- 
Real calibration method was used for the determination of the spring 
constant (around 0.2 N nv’). The trigger point was set to10 nN with an 
approach and retraction velocity of 5 ums“. To determine the Young’s 
modulus, the force-indentation curves were fit to the Hertz model for 
spherical tips through the Asylum Research software (v.16), with an 
assumed Poisson’s ratio value of 0.45 for the sample*.. Three distinct 
spots (40 x 40 pm? in size) were measured for each piece of tissue. The 
average stiffness of each spot was calculated for data analysis. 


Acutely dissociated ventricular zone cell culture 

Wild-type or Cep83cKO embryos were dissected out and sectioned 
using a vibratome (Leica Microsystems) at E15. The ventricular zone 
of the cortex was isolated, incubated ina protease solution containing 
10 units per ml papain (Fluka) in DMEM (Gibco) and triturated using a 
fire-polished Pasteur pipette to create a single-cell suspension. Cells 
were resuspended ina culture medium containing DMEM, glutamine, 
penicillin/streptomycin, sodium pyruvate (Gibco), 1 mM N-acetyl-L- 
cysteine (Sigma), B27 and N2, and plated onto coverslips coated with 
poly-L-lysine (Sigma) in 24-well plates. The cultures were maintained in 
ahumidified incubator at 37 °C witha constant supply of 5% CO,. About 
8-12 h later, the cultures were fixed and analysed for YAP expression. 


Quantification and statistical analysis 
For individual experiments, at least three wild-type and mutant mice or 
brains from multiple litters were examined. For immunohistochemistry 


experiments, multiple sections from individual brains were analysed. 
Nostatistical methods were used to predetermine sample size. Sample 
size was determined to be adequate on the basis of the magnitude and 
consistency of measurable differences between groups. No randomiza- 
tion of samples was performed. Mice subjected to the analyses were 
littermates, age-matched and included both sexes. Investigators were 
not blinded to mouse genotypes during experiments. Data are not 
subjective but are based on quantitative analyses. The number of times 
each experiment was repeated independently with similar results is 
provided in the figure legends. Statistical significance was determined 
using a chi-square or two-sided non-parametric Mann-Whitney Utest, 
and exact values from the tests are provided in the figures. Statistical 
significance was defined as P< 0.05. Statistical tests were performed 
with Prism (v.7, GraphPad). Effect sizes were calculated using Pearson’s 
r (chi-square) or U/(n1 x n2) (Mann-Whitney Utest). Bar graphs indicate 
mean + s.e.m. Box plots indicate median (centre line), interquartile 
range (box) and minimum and maximum values (whiskers). 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The datasets generated during and/or analysed during the current study 
are available from the corresponding authors on reasonable request. 
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Extended Data Fig. 1| Deletion of Cep83in cortical RGPs. a, Schematic 
diagram of the generation of the Cep83cKO mouse using a CRISPR-Cas9- 
mediated double-nicking strategy. The DNA sequence at the top depicts the 
sites targeted by a pair of guide RNAs (outlined in blue) downstream of the 
critical exon 3 in the Cep83 gene. Green boxes represent exons, red triangles 
represent /oxP sites and yellowtriangles represent FRT sites. Neo®, neomycin- 
resistance gene cassette. b, Representative Southern blot showing the correct 
gene targeting against the 5’-homology arm of the Cep83 floxed allele with the 
presence of the deletion-specific 3.5-kb band (n=3).c, Representative images 
of wild-type and Cep83cKO cortices at E12.5, E13.5 and E15.5, stained for PCNT 
(green) and ARL13B (a primary cilium marker; red), and with DAPI (blue) (n=3). 
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High-magnification images of individual centrosomes are shown inthe insets. 
Note the loss of primary ciliain the Cep83cKO cortex by E13.5. Scale bars, 10 pm 
(main image); 1m (inset). d, Representative images of P1 wild-type (n=11) and 
Cep83cKO (n=12) cortices stained for CTIP2 (green) and CUX1 (red), and with 
DAPI (blue). The asterisk indicates the folding in the medial region of the 
Cep83cKO cortex. High-magnification images of the folding are shown to the 
right. Scale bars, 500 pm (left); 200 pm (right). e, Representative images of the 
medial region of P1 wild-type and Cep83cKO cortices stained for brain lipid- 
binding protein (BLBP; green) and with DAPI (blue). High-magnification images 
are shown tothe right. Scale bars, 500 pm (left); 100 pm (right). 
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Extended Data Fig. 2| Deletion of Cep83 in RGPs leads to increased 
neurogenesis and gliogenesis. a, Representative images of the medial regions 
of P21 wild-type and Cep83cKO cortices stained for FOXP2 (green) and SATB2 
(red), and with DAPI (blue). Scale bar, 100 pm. b, Quantification of the number 
of SATB2’ (left) and FOXP2’ (right) neurons per 250-~m column (wild type, n=8 
brains; Cep83cKO, n=8 brains). c, Representative images of the dorsal regions 
of P21 wild-type and Cep83cKO cortices stained for FOXP2 (green) and SATB2 
(red), and with DAPI (blue). Scale bar, 100 pm. d, Quantification of the number 
of SATB2’ (left) and FOXP2° (right) neurons per 250-tm column (wild type, n=6 
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brains; Cep83cKO, n=5 brains). e, Representative images of P21 wild-type and 
Cep83cKO cortices stained for OLIG2 (an oligodendrocyte marker; green) and 
S100 (anastrocyte marker; red), and with DAPI (blue). Scale bar, 100 pm. 

f, Quantification of the number of OLIG2* oligodendrocytes (n=10 regions 
from 5 brains for each genotype) and S100* astrocytes (n=6 regions from3 
brains for each genotype) per 650-4m column. A two-sided Mann-Whitney U 
test was used for statistical analysis. Bar charts show mean + s.e.m. Box plots as 
in Fig.1. 
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Extended Data Fig. 3| See next page for caption. 
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Extended Data Fig. 3| Deletion of /ft88 in RGPs does not lead to any obvious 
defect in centrosome appendages, centrosome membrane anchorage or 
cortical development. a, Representative images of E13.5 wild-type and 
/ft88cKO ventricular zone surface stained for PCNT (green) and ARL13B (red), 
and with DAPI (blue) (n=3). High-magnification images of individual 
centrosomes are shown in the insets. Note the complete loss of primary ciliain 
the /ft88cKO cortex by E13.5. Scale bars, 10 pm (main image); 1 pm (inset). 

b, Representative ssTEM images of E15.5 wild-type (top) and /ft88cKO (bottom) 
ventricular zone surface showing individual centrosomes of RGPs in the apical 
endfeet. High-magnification images (white dashed boxes) are shown onthe 
right. Note that the /ft88cKO mother centriole possesses the DAPs that are 
anchored at the apical membrane (red arrows) and the sDAPs (yellow arrows), 
but does not support any microtubule-based ciliary axoneme (wild type, n=9 
centrosomes; /ft88cKO, n=20 centrosomes). All wild-type mother centrioles 
were anchored tothe apical membrane with microtubule-based cilia. All 
/ft88cKO mother centrioles were anchored to the apical membrane, but none 
possessed microtubule-based cilia. Scale bars, 800 nm (left); 200 nm (right). 
c, Representative images of E15.5 wild-type and /ft88cKO ventricular zone 
surface stained for PCNT (green) and CEP164 (red), and with DAPI (blue) (n= 3). 
High-magnification images of individual centrosomes are shown inthe insets. 


Note the normal presence of CEP164 at the centrosome in the /ft88cKO cortex. 
Scale bars, 10 pm (main images); 0.5 pm (insets). d, Representative images of 
E15.5 wild-type (n= 6) and /ft88cKO (n= 6) ventricular zone surface stained for 
PCNT (green) and ODF2 (an sDAP marker; red), and with DAPI (blue). High- 
magnification images of individual centrosomes are shown in the insets. Note 
the normal presence of ODF2 at the centrosome in the /ft88cKO cortex. Scale 
bars, 5 um (main image); 1 pm (inset). e, Representative whole-mount images of 
P21 wild-type and /ft88cKO brains. Scale bar, 0.5 cm. f, Quantification of the 
projected cortical area (n= 6 brains for each genotype). g, Images of P21 wild- 
type and /ft88cKO brain sections stained for CTIP2 (green) and CUX1 (red), and 
with DAPI (blue). Scale bar, 0.5 mm. h, Quantification of the cortical area (wild 
type, n=4 brains; /ft88cKO, n=4 brains). i, Images of the dorsal regions of P21 
wild-type and /ft88cKO cortices stained for CTIP2 (green) and CUX1 (red), and 
with DAPI (blue). Scale bar, 100 pm.j, Quantification of the number of CUX1* 
(left) and CTIP2* (right) neurons per 250-um column (n=8 brains for each 
genotype). k, Representative images of P21 wild-type and /ft88cKO brain 
sections along the rostrocaudal axis, stained with DAPI (grey) (n=5). Note that 
there is no obvious hydrocephalus in the /ft88cKO brain. Scale bar, 1mm. A two- 
sided Mann-Whitney Utest was used for statistical analysis. Bar charts show 
mean +s.e.m. Box plots as in Fig. 1. 
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Extended Data Fig. 4 | Expression of SmoM2in RGPs leads to cortical 
dysplasia. a, Representative whole-mount images of P1 wild-type and SmoM2 
brains (n=5). Arrowheads indicate the agenesis of the olfactory bulbin the 
SmoM2 brain. Scale bar, 0.5 cm. b, Representative images of P1 wild-type and 
SmoM2brain sections stained for CTIP2 (green) and CUX1 (red), and with DAPI 
(blue) (n=5). Arrowheads indicate the absence of corpus callosum inthe 
SmoM2brain. The asterisk indicates the agenesis of the hippocampus in the 
SmoM2brain. Scale bar, 0.5 mm.c, Representative images of P1 wild-type and 
SmoM2 cortices stained for CTIP2 (green) and CUX1 (red), and with DAPI (blue) 


(n=4). Note the drastic disorganization of the SmoM2 cortex. Scale bar, 100 pm. 


d, Representative images of E15.5 wild-type and SmoM2 ventricular zone 
surface stained for PCNT (green) and ARL13B (red), and with DAPI (blue) (n=3). 
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High-magnification images of individual centrosomes are shownin the insets. 
Note the presence of the primary cilium at the SmoM2 centrosome. Scale bars, 
5m (main image); 1m (inset). e, Representative images of E15.5 wild-type and 
SmoM2 ventricular zone surface stained for PCNT (green) and CEP89 (a DAP 
marker; red), and with DAPI (blue) (n=3). High-magnification images of 
individual centrosomes are shown in the insets. Note the normal presence of 
CEP89 at the SmoM2 centrosome. Scale bars, 5 um (main image); 1 pm (inset). 

f, Representative images of E15.5 wild-type and SmoM2 ventricular zone surface 
(n=5 brains) stained for PCNT (green) and ODF2 (red), and with DAPI (blue) 

(n=3 brains). High-magnification images of individual centrosomes are shown 
inthe insets. Note the normal presence of ODF2 at the SmoM2 centrosome. 
Scale bars, 5 um (main image); 1pm (inset). 
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Extended Data Fig. 5| See next page for caption. 
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Extended Data Fig. 5 | Deletion of Cep83 does not affect the densities of RGPs 
and intermediate progenitor cells at E13.5, but leads to increased densities 
of these cells in the dorsomedial cortex at E15.5.a, Representative images of 
E13.5 wild-type and Cep83cKO cortices stained for PAX6 (green) and TBR2 
(red), and with DAPI (blue). Scale bar, 50 pm. b, c, Quantification of the number 
of PAX6* (b) and TBR2* (c) cells per 250-m column ina. Wild type, n=8 brains; 
Cep83cKO, n=8 brains. d, Images of E12.5 wild-type and Cep83cKO cortices 
(dorsomedial region) that were subjected to EdU (red) and BrdU (green) 
sequential pulse-chase labelling. Cortices were stained for PAX6 (grey), and 
with DAPI (blue). Example regions (white dashed boxes) are shown at the 
bottom. Scale bars, 25 um.e, f, Quantification of the percentage of EdU*BrdU* 


cells among the total EdU’ cells in the ventricular zone (e), and the number of 
BrdU* cells in the ventricular zone per 250-~m column (f) (wildtype,n=6 
brains; Cep83cKO, n= 6 brains). g, Images of E15.5 wild-type and Cep83cKO 
cortices (dorsomedial region) stained for PAX6 (green) and TRB2 (red), and 
with DAPI (blue). Scale bar, 50 pm. h, i, Quantification of the number of PAX6* 
(h) or TBR2’ (i) cells per 250-m column (wild type, n=8 brains; Cep83cKO, n=6 
brains).j, Quantification of the distribution width of TBR2’ cells in wild-type or 
Cep83cKO cortices (E13.5,n=4 brains for each genotype; E15.5,n=5 brains for 
each genotype). A two-sided Mann-Whitney Utest was used for statistical 
analysis. Box plots as in Fig. 1. 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | Increased mitotic cells in the subventricular zone of 
the Cep83cKO cortex are predominantly intermediate progenitor cells. 

a, Images of E15.5 wild-type and Cep83cKO cortices stained for P-HH3 (a mitotic 
cell marker; red), and with DAPI (blue). Scale bar, 50 um. b, d, Images of E15.5 
wild-type and Cep83cKO cortices stained for P-HH3 (red) and TBR2 (green) (b) 
or SOX2 (an RGP marker; green) (d), and with DAPI (blue). High-magnification 
images of individual P-HH3* cells are shown at the bottom. Note that P-HH3* 
cells inthe subventricular zone of the Cep83cKO cortex are predominantly 
TBR2* but SOX2°. Scale bars, 25 xm (top); 10 pm (bottom).c, Images of E15.5 
wild-type and Cep83cKO cortices stained for P-HH3 (red) and phospho- 
vimentin (P-VIM; green), and with DAPI (blue). High-magnification images of 
individual P-HH3* cells are shown at the bottom. Scale bars, 25 pm (top); 10 um 
(bottom). e, f, Quantification of the number of apical (e) and basal (f) P-HH3* 


cells per 250-y1m column (wild type, n= 16 brains; Cep83cKO, n=14 brains). 

g,h, Quantification of the percentage of P-HH3* cells inthe subventricular zone 
that are TBR2° (g; lateral, n=4 brains for each genotype; medial, n=3 brains for 
each genotype) or SOX2 (h;n=4 brains for each genotype). i, Quantification of 
the percentage of P-HH3* cells without a P-VIM labelled basal radial glial fibre 
(lateral, n=4 brains for each genotype; medial, n=3 brains for each genotype). 
j-l, Representative images of E15.5 wild-type and Cep83cKO cortices stained 
for PAX6 (green) and three previously suggested markers of outer 
subventricular zone RGPs (HOPX (j), PTPRZ1(k) or TNC (I)) (red), and with DAPI 
(blue) (n= 4). Note that there is no obvious increase in the expression of HOPX, 
PTPRZ1or TNC inthe Cep83cKO cortex and low expression in both wild-type 
and Cep83cKO cortices. Scale bars, 50 um. A two-sided Mann-Whitney U 

test was used for statistical analysis. Box plots as in Fig. 1. 
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Extended Data Fig. 7 | Disruption of other components of the DAP assembly 
pathway leads to the overproduction of RGPs and intermediate progenitor 
cells. a, Diagram of the hierarchical DAP assembly pathway. b-d, Western blot 
assays to show the efficacy of shRNAs against Cep83 (b), Cep89 (c) or Sclt1 (d) in 
suppressing protein expression (n= 3). e, Representative images of E16.5 
cortices that received in utero electroporation of EGFP (green) together with 
shRNAs against Cep83 or against Cep89 and Sclt1 at E13.5. Cortices were stained 
for PAX6 (red, top) and TBR2 (red, bottom), and with DAPI (blue). Note that 
expression of Cep83 shRNA-a (which effectively suppresses protein 
expression), but not that of Cep83 shRNA-c (which does not suppress protein 
expression), leads toa significant increase in both PAX6* RGPs inthe 
ventricular zone and TBR2* intermediate progenitor cells in the subventricular 
zone. Moreover, expression of Cep89 shRNA-c and Sc/t1 shRNA-c (which 
effectively suppress protein expression), but not that of Cep89 shRNA-b and 
Scit1 shRNA-b (neither of which suppresses protein expression), resultsina 


similar increase in both PAX6* RGPs and TBR2’ intermediate progenitor cells. 
Scale bars, 100 pm. f, g, Quantification of the percentage of EGFP’ cells that are 
PAX6’ (f) or TBR2* (g). Note that similar to the Cep83cKO, expression of Cep8&3 
shRNA-a and Cep89 and Sclt1 shRNA-c—but not that of Cep83 shRNA-c or Cep89 
and Sclt1 shRNA-b—leads to a significant increase in PAX6* RGPs and TBR2* 
intermediate progenitor cells, indicating that disruption of other DAP 
components causes excessive production of RGPs and intermediate 
progenitor cells ina similar manner to the removal of CEP83 (control, n=4 
brains (f) and 5 brains (g); Cep83shRNA-a, n=5 brains; Cep89 and Sclt] shRNA-c, 
n=4 brains; Cep83 shRNA-c, n=6 brains; Cep89 and Sclt1 shRNA-b, n=4 brains). 
h, Quantification of the percentage of EGFP’ cells in different cortical regions 
(control, n=5 brains; Cep83 shRNA-a, n=5 brains; Cep89 and Sclt1 shRNA-c, n=4 
brains; Cep83 shRNA-c,n=7 brains; Cep89 and ScltI] shRNA-b, n=5 brains. A 
two-sided Mann-Whitney Utest was used for statistical analysis. Bar charts 
show mean +s.e.m. Box plots as in Fig. 1. 
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Extended Data Fig. 8 | Detachment of the centrosome from the apical 
membrane does not disrupt RGP polarity, junction formation or radial glial 
fibre scaffolding. a, Representative images of E15.5 wild-type and Cep83cKO 
cortices in coronal sections stained for a-tubulin (a-TUB; green), and with DAPI 
(blue) (n=3). Note that there is no obvious difference in non-apical membrane 
microtubules between the wild-type and Cep83cKO cortices. Scale bar, 10 pm. 
b, Representative en face images of mitotic RGPsin E15.5 wild-type and 
Cep83cKO cortices stained for a-tubulin (green), and with DAPI (blue) (n=12). 
Note that there is no obvious difference in microtubule spindles between the 
wild-type and Cep83cKO RGPs. Scale bar, 2 um.c, Representative images of 
E15.5 wild-type and Cep83cKO cortices stained for partitioning defective 
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protein 3 (PARD3; red), an evolutionarily conserved polarity protein, and with 
DAPI (blue) (n= 3). Scale bar, 10 pm. d, Representative images of E15.5 wild-type 
and Cep83cKO cortices stained for three junction markers (B-catenin (B-CAT; 
left), N-cadherin (N-CAD; middle) or ZO-1 (right)) (red), and with DAPI (blue). 
Scale bar, 50 um. e-g, Quantification of the staining intensity of B-catenin 

(e; wild type, n=8 brains; Cep83cKO, n=5 brains), N-cadherin (f; wildtype, n=4 
brains; Cep83cKO, n=3 brains) or ZO-1(g; wild type, n=8 brains; Cep83cKO, 
n=7brains) at the ventricular zone surface. h, i, Representative images of E15.5 
wild-type and Cep83cKO cortices stained for nestin (h) or BLBP (i) (green), and 
with DAPI (blue) (n=5). Scale bars, 50 um. A two-sided Mann-Whitney Utest 
was used for statistical analysis. Box plots as in Fig. 1. 
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Extended Data Fig. 9 | See next page for caption. 


Extended Data Fig. 9 | Centrosome detachment leads to enlargement of the 
apical membrane, and nuclear expression of YAP is lowin TBR2* 
intermediate progenitor cells and dissociated RGPs in culture. 

a, Representative en face segmented images of wild-type and Cep83cKO 
ventricular zone surface at E13.5 and E15.5. Each apical domain is colour-coded 
onthe basis of its size: blue colour indicates an apical domain that is relatively 
larger. Scale bar, 10 um. b, Quantification of the size of the apical domain of 
wild-type and Cep83cKO RGPsat E13.5 and E15.5 (E13.5: wild type, n=5,038 
apical domains from 8 embryos; Cep83cKO, n=2,891 apical domains from6 
embryos; E15.5: wild type, n=4,780 apical domains from12 embryos; 
Cep83cKO, n=1,959 apical domains from 8 embryos). c, Quantification of the 
size of the apical domain of interphase and mitotic wild-type and Cep83cKO 
RGPs at E15.5 (wild type, n=1,703 interphase apical domains and n=145 mitotic 


apical domains from 4 embryos; Cep83cKO, n= 988 interphase apical domains 
and n= 83 mitotic apical domains from 4 embryos). d, Representative images of 
the subventricular zone of E15.5 wild-type and Cep83cKO cortices stained for 
YAP (green) and TBR2 (red), and with DAPI (blue) (n=5). Individual TBR2* 
intermediate progenitor cells are shown in the insets. Note the low expression 
of YAP inthe nuclei of TBR2* intermediate progenitor cells in the subventricular 
zone of the wild-type and Cep83cKO cortices. Scale bars, 50 tm (main image); 
5m (inset). e, Representative images of acutely dissociated cell cultures of 
E15.5 wild-type and Cep83cKO cortical ventricular zone stained for SOX2 (red) 
and YAP (green), and with DAPI (blue). Scale bar, 20 pm. f, Quantification of the 
YAP staining intensity in SOX2* RGPs (wild type, n=13 brains; Cep83cKO,n=8 
brains). Atwo-sided Mann-Whitney Utest was used for statistical analysis. Box 
plots as in Fig. 1. 
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Extended Data Fig. 10 | Excessive neurogenesis in the Cep83cKO cortex 
depends onexcessive expression and activation of YAP. a, Representative 
high-magnification images of P21 wild-type, YapcKO, Cep83cKO and 
Cep83YapcDKO cortices stained for CTIP2 (green) and CUX1 (red), and with 
DAPI (blue). Scale bar, 100 pm. b, Quantification of the number of CUX1* 
neurons per 250-pm column (n=8 brains for each genotype). c, Representative 
en face images of coronal sections of E15.5 wild-type, YapcKO, Cep83cKO and 
Cep83YapcDKO cortices stained for a-tubulin (green) and actin (red), and with 


DAPI (blue). Scale bar, 5 um. d-f, Quantification of the intensity of microtubules 
per apical domain (d), individual apical domain size (e) and the intensity of 
PCNT per apical domain (f) (wild type, n=1,730 apical domains from 4 
embryos; YapcKO, n=1,074 apical domains from 3 embryos, Cep83cKO, n=456 
apical domains from 3 embryos; Cep83YapcDKO, n=540 apical domains from3 
embryos). Atwo-sided Mann-Whitney Utest was used for statistical analysis. 
Bar charts show mean + s.e.m. Box plots as in Fig. 1. 
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Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 
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Software and code 


Policy information about availability of computer code 


Data collection FluoView (version 4.2, Olympus) and NDP viewer (version 2, Hamamatsu Photonics) were used for generating fluorescence image data. AMT 
(version7.0.0.95) was used for generating electron microscopy data. MRtrix 3.1 and DtiStudio 3.0 (v2 10 6, http://www.mristudio.org) were used for 
generating MRI data. ZEN 2.1 (version 11.0) was used for generating bright-field images. Asylum Research Software (version 16) is used for generating 
atomic force microscopy data. 


Volocity (version 6.3, PerkinElmer), ImageJ/Fiji (1.52, NIH) and its plugins (Tissue Analyzer and MorpholibJ, MATLAB (version 2016b, MathWorks) and 
Imaris (9.0.1, Oxford Instrument) were used for fluorescence image analysis. DTIStudio and DiffeoMap (v2 10 6, http://www. mristudio.org) were used 
for MRI data analysis. Prism (version 7, GraphPad) was used for statistical analysis. MATLAB (version R2016b, MathWorks) was used for generating 
segmented images of apical domains. 


Data analysis 
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We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 
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- Accession codes, unique identifiers, or web links for publicly available datasets 
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- A description of any restrictions on data availability 


The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request. 
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Sample size o sample-size calculations were performed. Sample size was determined to be adequate based on the magnitude and consistaency of 
measurable differences between groups. 


Data exclusions Data were only excluded for failed experiments. 


Replication The numbers of times each experiment was repeated independently with similar results were provided in the figure legends. 
Randomization o randomization of samples was performed. 
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Antibodies used The supplier name, catalog number, lot number, dilution for all the antibodies were provided in the Methods. 


Validation All the primary antibodies used in the study have been used and validated in the previous studies. The antibody registry ID (RRID) 
for each primary antibody was provided in the Methods. 


Eukaryotic cell lines 
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Cell line source(s) Rockefeller University Gene Targeting Resource Centre - W4 mouse embryonic stem (ES) cell ine of 12986 background 
Authentication The authors did not authenticate the referred cell line. 
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Laboratory animals The following mouse strains including Cep83fl/fl, Actin-Flp (Jax#005703), Emx1-Cre (Jax#005628), Ift88fl/fl, R26-LSL-SmoM2, and 
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Field-collected samples No field-collected sample was used. 


Ethics oversight All animal procedures were approved by the Institutional Animal Care and Use Committee (IACUC) of the Memorial Sloan 


Kettering Cancer Center, New York USA and Tsinghua University, Beijing China. 
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Preprocessing 
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The segmental organization of the vertebral column is established early in 
embryogenesis, when pairs of somites are rhythmically produced by the presomitic 
mesoderm (PSM). The tempo of somite formation is controlled by a molecular 
oscillator known as the segmentation clock’”. Although this oscillator has been well- 
characterized in model organisms'”, whether a similar oscillator exists in humans 
remains unknown. Genetic analyses of patients with severe spine segmentation 
defects have implicated several human orthologues of cyclic genes that are associated 
with the mouse segmentation clock, suggesting that this oscillator might be 
conserved in humans’. Here we show that human PSM cells derived in vitro—as well as 
those of the mouse*—recapitulate the oscillations of the segmentation clock. Human 
PSM cells oscillate with a period two times longer than that of mouse cells (5h versus 
2.5h), but are similarly regulated by FGF, WNT, Notch and YAP signalling>. Single-cell 


RNA sequencing reveals that mouse and human PSM cells in vitro followa 
developmental trajectory similar to that of mouse PSM in vivo. Furthermore, we 
demonstrate that FGF signalling controls the phase and period of oscillations, 
expanding the role of this pathway beyond its classical interpretation in ‘clock and 
wavefront’ models!. Our work identifying the human segmentation clock represents 
an important milestone in understanding human developmental biology. 


Inthe mouse, the early stages of paraxial mesoderm development can 
be recapitulated in vitro from mouse embryonic stem (ES) cells by first 
inducing an epiblast fate with activin A and FGF, followed by culture 
in medium containing the WNT agonist CHIRON99021 (Chir) and the 
BMP inhibitor LDN193189 (LDN) (Chir-LDN medium; hereafter, CL 
medium)** (Fig. 1a, Extended Data Fig. la—c). After 24 hin CL medium, 
epiblast-like cells acquire a neuromesodermal progenitor’® or anterior 
primitive streak fate, expressing T (also known as Brachyury), Sox2 
and PouSf1 (also known as Oct4) (Fig. 1a, Extended Data Fig. 1b, c). By 
48 h, cells activate the PSM markers 7bx6 and Msgni1 (Fig. 1a, Extended 
Data Fig. 1b-e). This transition to PSM is paralleled by an epithelium- 
to-mesenchyme transition, marked by a switch from Cdh1 to Cdh2 
(Extended Data Fig. 1b). 

To further characterize the identity of these mouse PSM cells gen- 
erated in vitro, we benchmarked their transcriptomes against the 
embryonic mouse PSM. Using single-cell RNA sequencing (scRNA- 
seq)’, we analysed 5,646 cells dissociated from the posterior region 
of two mouse embryos at embryonic day (E)9.5. Clustering analysis 
revealed 21 distinct cell states that correspond to expected deriva- 
tives of all three germ layers (Extended Data Fig. 2a-d, Supplementary 
Table 1). Transcriptomes of paraxial mesoderm and neural tube cells, 
which share acommon developmental origin’, were represented as 


a k-nearest neighbour (k-NN) graph (Fig. 2a). Genes that were differ- 
entially expressed between cell clusters (Extended Data Fig. 3a—d) and 
along a pseudotemporal trajectory (Fig. 2b, Supplementary Table 2) 
stratified distinct phases of paraxial mesoderm differentiation as fol- 
lows. One cluster, which coexpressed Sox2 and T, represented neu- 
romesodermal progenitors and was positioned between the posterior 
neural tube and paraxial mesoderm clusters, consistent with the known 
bipotentiality of these cells’. Two clusters that expressed T, Rspo3, 
Tbx6, Dil3 and Foxcl represented mesodermal precursor cells" and the 
more-mature posterior PSM. These two clusters also express the Notch- 
pathway genes Hes7, Lfng, Dill and DIl3 (Extended Data Fig. 3c-e), and 
probably correspond tothe in vivo oscillatory domain. The next cluster 
corresponds to the anterior PSM, which is marked by expression of 
Mesp1 and Ripply2 (Fig. 2b). 

We compared the transcriptomes of these in vivo cell states of the E9.5 
mouse to those of 21,478 mouse ES cells differentiated in vitro. Cluster- 
ing analyses indicated the rapid differentiation of mouse ES cells over 
the first three days, with each time point largely dominated by asingle 
cluster: naive ES cells (day 0), epiblast (day 2) and neuromesodermal 
progenitors or anterior primitive streak (day 3), followed by asynchro- 
nous transcriptional changes over the final two days (Fig. 2c, Extended 
Data Fig. 3f, g). A substantial proportion of the differentiating mouse 
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Fig. 1|Recapitulation of the mouse and human segmentation clocksin vitro 
by differentiation of pluripotent stem cells towards PSM fate. a, 
Immunofluorescence for stage-specific markers (left) and images of the mouse 
ES cell pMsgnI-Venus reporter or human iPS cell MSGNI-Venus reporter (right) 
in differentiating mouse and human pluripotent stem cells. Scale bar, 100 pm. 
n=7 independent experiments. MPCs, mesodermal precursor cells; NMPs, 
neuromesodermal progenitors. b, Normalized HES7-Achilles intensity profiles 
for three PSM cells derived from mouse ES cells, imaged in CLFBR medium. 
n=17 independent experiments. Normalized fluorescence intensity is 
expressed in arbitrary units. c, Period of HES7-Achilles oscillations in PSM cells 
derived from mouse ES cells or human iPS cells, cultured in CLFBR medium. 
Mean +s.d.n=25 independent experiments. d, Heat map of HES7-Achilles 
intensity over time in PSM cells derived from mouse ES cells, in CLFBR medium. 
Each rowrepresents one cell. n=15 cells. e, Normalized HES7-Achilles intensity 
profiles for three PSM cells derived from human iPS cells, imaged in CLFBR 
medium. n=23 independent experiments f, Heat map of HES7-Achilles 
intensity over time in PSM cells derived from human iPS cells, in CLFBR 
medium. Eachrowrepresents one cell.n=15 cells. 


ES cells adopted a fate trajectory similar to that of the cells in vivo, by 
progressively expressing Sox2, T, Rspo3, Tbx6, Dil3 and Foxcl (Fig. 2d, 
Extended Data Fig. 3h-j). Approximately 46% of differentiating mouse 
ES cells ultimately adopted a state similar to that of the posterior PSM 
(Fig. 2c, Extended Data Fig. 3g). 

We trained a k-NN classifier on the transcriptional signatures of the 
cell clusters of the E9.5 mouse, and used it to assign identities to indi- 
vidual cells derived from mouse ES cells on days 4 and 5 of differentia- 
tion. An identity similar to that of the posterior PSM cells of the E9.5 
mouse was the most-abundantly classified state within cells of the 
posterior PSM cluster of mouse ES cells at days 4 and 5 (Fig. 2g). States 
classified as posterior PSM in the E9.5 mouse were enriched amongst 
the posterior PSM branch of the k-NN graph of mouse ES cells at days 4 
and 5 (Fig. 2h), and similar enrichments were observed using three 
classification algorithms (Extended Data Fig. 4a). We also detected a 
collinear trend in the expression of Hox genes during the differentia- 
tion of mouse ES cells (Extended Data Fig. 4b). Together, these results 
suggest a broad transcriptional similarity between paraxial mesoderm 
cells derived from mouse ES cells and their in vivo counterparts. 

Oscillations of a Hes7-luciferase reporter in PSM cells differenti- 
ated from mouse ES cells in 3D cultures have recently been reported”. 
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To visualize oscillations of the segmentation clock in two dimensions, 
we generated a mouse ES cell reporter line in which a destabilized ver- 
sion of the yellow fluorescent protein variant Achilles was knocked in 
the 3’ end of the Hes7 gene” (Extended Data Fig. 1g). When differenti- 
ated towards PSM, a subset of cells showed oscillatory expression of 
the Hes7-Achilles gene, with a period of 2.5+ 0.4h(n=25 independent 
experiments)—similar to the period of the segmentation clock inmouse 
embryos" (Fig. lb-d, Extended Data Fig. 1h-i, Supplementary Video1). 
This oscillatory state could be extended by adding FGF4, the retinoic 
acid inhibitor BMS493 and the Rho kinase inhibitor (ROCKi) Y-27362 
to the CL medium (hereafter, CLFBR medium) (CL medium, 45+6.6h 
(n=8 independent experiments) versus CLFBR medium, 61.2 +5.7h 
(n=12independent experiments)) (Extended Data Fig. lj, k). Therefore, 
PSM cells differentiated from ES cells in vitro can reliably model the 
segmentation clock. 

We next implemented a similar in vitro strategy to identify the human 
oscillator. Human induced pluripotent stem (iPS) cells differentiated 
in CL medium acquire a neuromesodermal progenitor or anterior 
primitive streak fate, characterized by 7 (also known as TBX7T) and 
SOX2 expression, after 24 h (Fig. 1a), and a PSM fate marked by MSGNI1 
and 7BX6 expression after 48 h (Fig. 1a, Extended Data Fig. 1f). ACDH1- 
to-CDH2 switch is also observed, as in mouse ES cells (Extended Data 
Fig. 1b). The induction efficiency of human cells carrying a MSGNI- 
Venus knock-in reporter was markedly high compared to that in mouse, 
reaching 92.6 +1.5% (n=8 independent experiments) (Extended Data 
Fig. 1d, e, Supplementary Video 2). 

We compared 14,750 differentiating human iPS cells analysed by 
scRNA-seq to the in vivo and in vitro mouse-cell states. Early collec- 
tion time points clustered uniformly and sequentially along the k-NN 
graph, whereas the final two time points displayed continuous and 
overlapping transcriptional features (Extended Data Fig. 3k, |). Differ- 
ential gene expression and pseudotemporal ordering analyses revealed 
shared molecular characteristics between the human clusters and 
both the in vivo and in vitro mouse PSM lineages (Fig. 2e, f, Extended 
Data Fig. 3m-o). Cells collected after 1 day exhibited characteristics 
of neuromesodermal progenitors or anterior primitive streak cells, 
showing expression of NODAL, T, MIXL1 and SOX2. By day 2, human 
cells resembled the mouse mesodermal precursor cell and posterior 
PSM clusters, showing expression of T, MSGN1, TBX6, DLL3, WNT3A 
and FGF17, as wellas the Notch-associated cyclic genes LFNG and HES7. 
At days 3 and 4, cells show the expression of markers of anterior PSM, 
such as FOXC1 (Fig. 2f, Extended Data Fig. 3n, 0). Machine-learning 
classifiers trained on the mouse embryonic cell states consistently 
assigned anidentity similar to that of the posterior PSM cluster of E9.5 
mouse to clusters of human iPS cells on days 2-4 (Fig. 2g, h, Extended 
Data Fig. 4c). We detected collinear activation of HOX gene clusters, 
beginning with HOXA1 and HOXA3 on day Land culminating with HOXB9 
and HOXC8 on day 4 (Extended Data Fig. 4d). Thus, the differentiation 
of human iPS cells to a PSM fate in vitro in CL medium recapitulates a 
developmental sequence similar to that of the mouse embryo, leading 
to the production of trunk paraxial mesoderm cells. 

To assess whether PSM cells derived from human iPS cells exhibit 
segmentation clock oscillations, we generated a HES7-Achilles iPS cell 
reporter cell line (Extended Data Fig. 1g). After 48 hin CL medium, 
most cells started to show reporter oscillations with a mean period of 
4.9+0.3h(n=25 independent experiments) and constant frequency 
(Fig. Ic, e, f, Extended Data Fig. 1I-p, Supplementary Videos 3, 4). No 
oscillations could be detected when LDN was omitted, consistent with 
the need for BMP4 inhibition to induce the paraxial mesoderm fate” 
(Extended Data Fig. 1q). The total number of oscillations observed 
could be approximately doubled by culturing in CLFBR medium (CL 
medium 4.7 + 0.8 oscillations versus CLFBR medium 10.2 + 1.6 oscil- 
lations (n = 15 independent experiments)) (Extended Data Fig. Ir, s). 
These experiments support the existence of a human segmentation 
clock that ticks with an approximately 5-h period. 
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Fig. 2|scRNA-seq analysis of differentiating mouse and human PSM. a, k-NN 
graph of mouse neural tube, PSM and somite clusters at E9.5 (2,340 cells, 20 
principal component dimensions), visualized with ForceAtlas2 and coloured 
using Louvain cluster identities. p, posterior; a, anterior. b, Pseudotemporal 
ordering of non-neural cells at E9.5. Heat maps illustrate genes with significant 
dynamic expression (exp.) ordered by peak expression (Methods), and selected 
markers of paraxial mesoderm differentiation. Colour bars indicate 
pseudotemporal position with approximate locations of Louvain cluster 
centres indicated. Dotted line marks the determination front (the boundary 
between the anterior and posterior PSM). SOM, somite. c, Batched-balanced k- 
NN graph of single-cell transcriptomes of mouse ES cells (21,478 cells), 
coloured by Louvain cluster identity and visualized with ForceAtlas2. Cell 
numbers for the three terminal day-4 and day-5 states are indicated. Epi, 
epiblast. d, Pseudotemporal ordering of mouse ES cells along a path towards 


A characteristic property of the segmentation-clock oscillations 
in vivo is their high local synchrony’”. Synchronization of oscillations 
appears to be recapitulated in vitro in human, but not mouse, PSM 
cells (Fig. 1d, f). To track individual PSM cells derived from human iPS 
cells, we diluted HES7-Achilles reporter cells that express a nuclear 
label (pCAG-H2B-mCherry) in an excess of unlabelled cells (Fig. 3a, 
Extended Data Fig. 5a, Supplementary Video 5). The average diffusion 
of cells in vitro (2.4 + 2.2 square micrometres per minute) (Extended 
Data Fig. 5b) was comparable to that of chicken-embryo PSM cells 
in vivo (0.5-8 square micrometres per minute)”. Analysis of the phase 
of individual oscillators did not reveal any spatial structure, arguing 
against the existence of travelling waves in these cultures (Extended 


the putative PSM state at days 4 and 5S. The heat map shows selected markers 

of paraxial mesoderm differentiation. e, Batched-balanced k-NN graph 
(ForceAtlas2 layout) of single-cell transcriptomes of human iPS cells 

(14,750 cells), coloured by Louvain cluster identities. f, Pseudotemporal 
ordering of human iPS cells along a path towards the terminal PSM state at days 
3and 4. The heat map shows selected markers of paraxial mesoderm 
differentiation. g, Machine-learning classification of human and mousein vitro 
cultured cells. Ak-NN classifier trained on clusters of the E9.5 mouse was used 
to predict identities of terminal in vitro states (inset, red cells). The heat maps 
depict the fraction of E9.5 assignments for mouse ES cells at day 4 and 5 and 
human iPS cells at days 2-4. h, Overlay of k-NN classifier scores (fraction of 
nearest neighbours with the posterior PSM label of the E9.5 mouse) onto the 
mouse ES cell and human iPS cell K-NN graphs. 


Data Fig. 5c, Supplementary Video 6). Tracking large numbers of cells 
enabled us to assess quantitatively the degree of global synchrony 
using the Kuramoto order parameter". This analysis confirmed that 
cells oscillate in synchrony, as the order parameter was significantly 
higher relative to a model with randomized phases (0.43 + 0.15 versus 
0.094 + 0.09, paired two-sided t-test P=5 x10 (n=139 cells)) (Fig. 3b, 
c, Extended Data Fig. 5d-f). 

The Kuramoto order parameter decreased over time, indicating a 
progressive decay of synchrony (Fig. 3c, Extended Data Fig. 5d, f). This 
prompted us to explore cell division as a potential source of increas- 
ing noise over time. Cell division was not temporally coordinated 
between cells—roughly 5% of cells were in M phase at any given point 
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Fig. 3 | Synchronization of individual oscillators withinhuman PSM 
cultures. a, Experimental strategy for automated tracking of HES7-Achilles 
oscillations in individual cells. Scale bar, 100 pm. AU, arbitrary units. 

b, Kuramoto order parameter for HES7-Achilles cells versus the same dataset 
with randomized phases. Mean +s.d. Paired two-sided f-test, P=5 x107”. 
n=139 cells.c, Kuramoto order parameter time course of HES7-Achilles human 
PSM cells. Synchronization threshold shown as mean +s.d. of the Kuramoto 
order parameter for same dataset, but with randomized phases. n=139 cellsd, 
Average intensity profiles for individual HES7-Achilles human PSM cells treated 
with vehicle control (DMSO) or 25 uM DAPT. Mean + 95% confidence interval. 
n=152 cells (control) or 106 cells (DAPT). e, HES7-Achilles fluorescence in 
human PSM cells following treatment with DMSO or DAPT (25 uM). 

n=9 independent experiments. Scale bar, 100 um. f, Kuramoto order 
parameter for HES7-Achilles cells treated with DMSO or 25 uM DAPT. Mean + s.d. 
Paired two-sided t-test, P=2.6 x10. n=131 cells (control) or 110 cells (DAPT). 
g, Experimental strategy for analysis of oscillations in isolated human PSM 
cells. h, Representative HES7-Achilles intensity profiles for three isolated 
human PSM cells in medium containing DMSO, 350 nM latrunculin A (lat A), or 
350 nM latrunculin Ain combination with 25 pM DAPT. n=5 independent 
experiments. 


(Extended Data Fig. 5g, h). The cell-cycle time was 22 + 3.6h (n=26 cells), 
indicating that division takes place on a time scale different to that of 
HES7 oscillations (Extended Data Fig. 5i). The ratio between cell-divi- 
sion time and clock period is the same as observed in vivo for chicken 
PSM’?”°. The distribution of phases at mitosis was evenly spread, sug- 
gesting a lack of correlation between the phase of HES7-Achilles oscil- 
lation and cell division (Extended Data Fig. 5j). Inhibiting cell division 
with aphidicolin (Extended Data Fig. 5h) did not affect oscillations 
or order-parameter dynamics (control 0.404 + 0.2065 (n= 45 cells) 
versus aphidicolin 0.3465 + 0.1526 (n= 48 cells), paired two-sided t 
test, P= 0.348) (Extended Data Fig. 5k-m). Thus, cell division is not 
an important source of noise for HES7-Achilles oscillations in human 
PSM cells in vitro. 

Notch signalling has previously been implicated in the maintenance 
and local synchronization of oscillations*”’”’. Treating human and 
mouse HES7-Achilles cells with the Notch inhibitor DAPT (N-[N-(3,5- 
difluorophenacetyl)-L-alanyl]-S-phenylglycine t-butyl ester) in CLFBR 
medium led toa dampening of oscillations and eventual loss of HES7— 
Achilles expression (Fig. 3d, e, Extended Data Fig. 5n—p). Thus, HES7 
oscillations require active Notch signalling. The Kuramoto order param- 
eter was lower, and decreased more rapidly, in DAPT-treated cultures 
relative to control (control 0.407 + 0.22 (n=131 cells) versus DAPT- 
treated 0.266 + 0.153 (n=110 cells) P< 0.000001) (Fig. 3f, Extended Data 
Fig. 5q). We conclude that synchronization of HES7-Achilles oscillations 
in PSM cultures derived from human iPS cells is Notch-dependent. 
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We further assessed whether YAP signalling regulates oscillations 
in human cells, as it does in mouse embryos’. No oscillations were 
detected when human PSM cells were cultured as isolated cells (Fig. 3g, 
h, Supplementary Video 7). However, treatment with latrunculin A— 
which inhibits YAP signalling*—restored oscillations (Fig. 3h, Extended 
Data Fig. 5r, Supplementary Video 7). Isolated cells treated with latrun- 
culin A continued to oscillate even with DAPT treatment (Fig. 3h, Sup- 
plementary Video 7). We could not detect substantial enrichment of 
NOTCHI intracellular domain binding at the HES7 or LFNG promot- 
ers in isolated cells by chromatin immunoprecipitation followed by 
quantitative PCR (ChIP-qPCR) (Extended Data Fig. 5s). Isolated cells 
treated with latrunculin A alone, or in combination with DAPT, showed 
the characteristic approximately 5-h period observed in confluent 
cultures, which suggests that the period is controlled autonomously 
and independently of Notch cleavage (Extended Data Fig. Su). The 
Kuramoto order parameter was significantly lower than in confluent 
controls (control 0.415 + 0.194 (n=53 cells) versus latrunculin A treat- 
ment 0.221 + 0.137 (n= 18 cells) versus treatment with latrunculin A 
and DAPT 0.1972 + 0.095 (n= 18 cells)), which suggests that cell com- 
munication is required for the maintenance of synchrony (Extended 
Data Fig. 5v, w). Thus, the human segmentation clock—similar to its 
mouse counterpart*—can be viewed as an excitable system in which 
Notch provides the stimulus and YAP controls the excitability threshold. 

In vivo, PSM cells experience posterior-to-anterior gradients of 
FGF and WNT signalling that control their maturation (Fig. 4a)'. In 
differentiating mouse and human cultures, staining for doubly phos- 
phorylated ERK (dpERK) and B-catenin showed that the FGF and WNT 
pathways are active at the neuromesodermal progenitor and poste- 
rior PSM stages, but are strongly downregulated at later stages in CL 
medium (Fig. 4a). Treatment with the FGF receptor inhibitor PD173074 
(PD17) decreased the dpERK signal (Extended Data Fig. 6a), indicating 
that ERK activation is FGF-dependent and most probably downstream 
of FGF8 and FGF17 (which are expressed by the cells) (Extended Data 
Fig. 6b). Thus, differentiating mouse and human cells are exposed 
to transient WNT and FGF signalling as in the posterior PSM in vivo 
(Fig. 4a). The regulation of FGF and WNT signalling in vitro is largely 
autonomous. 

We next assessed the effect of prematurely downregulating FGF 
and WNT signalling on segmentation-clock oscillations in vitro. FGF 
signalling was inhibited by treating human PSM cells with PD17 or the 
MEK1and MEK2 inhibitor PD0325901 (PDO3), whereas WNT signalling 
was blocked using the tankyrase inhibitors XAV939 (XAV) or IWR-1 
(Extended Data Fig. 6a, c-e). Both FGF and WNT inhibition resulted 
in dampening and eventual arrest of oscillations without affecting 
their period (Fig. 4b, c, Extended Data Fig. 6f-h). In the case of PDO3, 
higher doses resulted in faster dampening and fewer oscillations before 
arrest (Fig. 4d, e, Extended Data Fig. 6i-1). Mouse Hes7-Achilles cells 
responded similarly to FGF and WNT inhibitors (Extended Data Fig. 6m). 
Oscillations in human cells treated with FGF inhibitors—but not cells 
treated with WNT inhibitors—exhibited a phase shift relative to control 
cells, regardless of inhibitor dosage (Fig. 4f, Extended Data Fig. 6n, 0). 
We could also detect this phase shift in Notch target gene oscillations 
upon FGF inhibition, using quantitative PCR with reverse transcription 
(qRT-PCR) for the HES7 and LFNG genes (Extended Data Fig. 6p, q). 
These data suggest that FGF functions to modulate oscillator proper- 
ties in addition to controlling PSM maturation. 

To further examine the role of FGF signalling on oscillatory proper- 
ties, we used an ex vivo system that consists of micropatterned cultures 
of PSM explants taken from the mouse line LuVeLu (which expresses a 
Lfng transcriptional reporter)° (Extended Data Fig. 6r). Treating mouse 
cultures with increasing doses of FGF inhibitors led to a dose-depend- 
ent decrease in number of oscillations (Extended Data Fig. 6s, t). We 
observed a progressive increase in the period with increasing doses 
of inhibitor, as observed for Lfng oscillations during PSM maturation 
in vivo“(Extended Data Fig. 6u). Our data thus indicate that FGF activity 
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Fig. 4| FGF signalling regulates the dynamic properties of the segmentation 
clock. a, Left, scheme illustrating the posterior-to-anterior gradients of FGF 
and WNT signalling along the PSM. Right, Immunofluorescence for dpERK, 
B-catenin and DAPI nuclear stain in differentiating human iPS cells and mouse 
ES cells. n=8 independent experiments. Scale bar, 100 um. b, Average intensity 
profiles for individual HES7-Achilles human PSM cells treated with vehicle 
control (DMSO), 250 nM PDO3 or 250 nM PDI17. Mean + 95% confidence interval. 
n=89 cells (control), 30 cells (PDO3) or 34 cells (PD17).c, Average intensity 
profiles for individual HES7-Achilles human PSM cells treated with vehicle 
control (DMSO) or 2 1M XAV. Mean + 95% confidence interval. n= 67 cells 
(control) or 29 cells (XAV).d, Number of HES7-Achilles oscillations before 
arrest in individual human PSM cells treated with 2 1M XAV, 250 nM PDI17, or 
100 nM, 250 nM or 500 nM PDO3. Mean + s.d. One-way analysis of variance 
(ANOVA):100 nM versus 250 nM, P=2.3 x 10°; 100 nM versus 500 nM, 
P=2.2x10™°; 250 nM versus 500 mM, P=1.5x10°.n=34 cells per condition. 

e, Mean HES7-Achilles intensity for individual HES7-Achilles human PSM cells 
treated with 2 1M XAV, 250 nM PD17, or 100 nM, 250 nM or 500 nM PDO3. 


regulates the dynamics (period, phase and amplitude) of cyclic gene 
oscillations and does not only control the oscillatory arrest at the wave- 
front, as proposed in classical models?”>. 

In vivo in mouse and chicken embryos, cells at the determination 
front periodically activate Mesp2 and Ripply2 ina stripe that defines 
the boundaries of the future segment”®. Using quantitative PCR, 
we observed that the arrest of HES7—Achilles oscillations in human 
cells coincided with MESP2 and RIPPLY2 expression, which could be 
delayed by culturing cells in CLFBR medium (Extended Data Fig. If, s). 
To image the transition from the oscillatory to the segmental fate, 
we generated a dual human iPS cell reporter line carrying a knock-in 
MESP2-H2B-mCherry reporter in addition to HES7-Achilles. When cul- 
tured in CLFBR medium, aseries of approximately 12 oscillations was 
followed by the activation of the MESP2-mCherry signal in an increas- 
ing subpopulation of scattered cells (Fig. 4g, Extended Data Fig. 7a, b, 
Supplementary Video 8). Treatment with DAPT prevented MESP2- 
mCherry activation—as expected, given that Mesp2 is a Notch target 
in mouse embryos (Extended Data Fig. 7c, Supplementary Video 8). 
Conversely, oscillatory arrest and MESP2—mCherry onset was prema- 
turely triggered by either FGF or WNT inhibition (Fig. 4h, i, Extended 
Data Fig. 7a, b, d,e, Supplementary Video 8). Increasing concentrations 
of PDO3 resulted in faster activation of MESP2-mCherry (Extended 
Data Fig. 7b). Therefore, PSM cells derived from human iPS cells reca- 
pitulate segmental determination, which is dynamically controlled by 
levels of FGF and WNT. 
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Mean +s.d. One-way ANOVA: 100 nM versus 250 nM, P=1.2x10°; 100nM 
versus 500 nM, P=3 x10; 250 nM versus 500 nM, P=6.9 x 10°°.n=46 cells 
(XAV), N= 28 cells (PD17), n= 47 cells 100 nM PDO3), n= 64 cells (250 nM PDO3), 
n=26 cells (SOOnMPDO3).f, Summary statistics comparing the instantaneous 
absolute phase difference relative to control for individual cells treated with 
vehicle control (DMSO), 2 1M XAV, 250 nM PD17, or 100 nM, 250 nMor500nM 
PDO3. Mean +s.d. One-way ANOVA: control versus XAV, P= 0.0578; control 
versus PD17, P= 9.2 x 10°°; control versus 100 nM PDO3, P=1.3 x10; control 
versus 250 nM PDO3, P=1.1x 107°; control versus 500 nM PDO3, P=1.1x 107; 
100 nM versus 250 nM PDO3, P= 0.8338; 100 nM versus 500 nM PDO3, 
P=0.0601; 250 nM versus 500 nM P= 0.061. NS, not significant. n fixed at 
11,000 observations. Full histograms are provided in Extended Data Fig. 6n. 
g-i, HES7-Achilles and MESP2-mCherry intensity profiles in small regions of 
interest within human PSM cultures. Mean +s.d. Dotted line denotes the 
threshold for MESP2-mCherry activation (25 AU). g, Vehicle control (DMSO). 
h, PDO3 (250 nM).i, XAV (2 UM). n=15 replicate experiments. 


Our work provides evidence for the existence of ahuman segmenta- 
tionclock, demonstrating the conservation of this oscillator from fish 
to human. We identify the human clock period as around 5h, indicating 
that it operates roughly 2x slower than the mouse counterpart”. This is 
consistent with the known difference in developmental timing between 
mouse and human embryos”. Our culture conditions, in which cells are 
treated with only two chemical compounds ina defined medium, enable 
the production of an unlimited supply of human PSM-like cells. This 
represents an ideal system for investigating the dynamical properties 
of the oscillator, as well as its dysregulation in pathological segmenta- 
tion defects such as congenital scoliosis. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized and investigators were not blinded 
to allocation during experiments and outcome assessment. 


Generation of reporter lines 

The CRISPR-Cas9 system for genome editing”® was used to gener- 
ate three reporter lines in human iPS cells (HES7-Achilles, HES7- 
Achilles;pCAG-H2B-mCherry and HES7-Achilles;MESP2-mCherry) and 
one mouse ES cell reporter line (Hes7-Achilles). To target the HES7 locus 
inhuman NCRM1IiPS cells, a single-guide RNA (Extended Data Table 1) 
targeting the 3’ end of HES7 was designed using the MIT Crispr Design 
Tool (www.crispr.mit.edu) and cloned into the pGuide-it-tdTomato vec- 
tor (Takara cat. no. 632604). We also generated a repair vector consist- 
ing of 1-kb 3’ and 5’ homology arms flanking a self-cleaving T2A peptide 
sequence, followed by the fast-folding yellow fluorescent protein (YFP) 
variant Achilles'®, two destabilization domains (CL1 and PEST), anda 
nuclear localization signal (T2A-Achilles-NLS-CL1-PEST) in a pUC19 
vector backbone by means of Gibson assembly (NEB). The assem- 
bled repair vector was then mutated by site-directed mutagenesis to 
eliminate the PAM site (specific mutation noted in Extended Data 
Table 1) in using the In-Fusion cloning kit (Takara). Both the pGuide- 
it-tdTomato and targeting vectors were delivered to iPS cells by 
nucleofection using a NEPA 21 electroporator. Twenty-four hours 
after nucleofection, cells were sorted by TdTomato expression using 
an S3 cell sorter (Biorad) and seeded at low density in Matrigel-coated 
plates (Corning, cat. no. 35277) in mTeSR1 (StemCell Technologies cat. 
no. 05851) + 10 pM Y-27362 dihydrochloride (Tocris Bioscience, cat. 
no. 1254). Single cells were allowed to expand clonally and individual 
colonies were screened by PCR for targeted homozygous insertion of 
2A-Achilles-CL1-PEST-NLS immediately before the stop codon of HES7. 
Positive clones were sequenced to ensure no undesired mutations in 
the HES7 locus had been introduced by the genome-editing process. 
Three homozygous clones were further validated by qRT-PCR and 
immunofluorescence. 

An identical approach was used to target the Hes7 locus in mouse 
E14 ES cells, except the pGuide-it-tdTomato and targeting vectors were 
delivered by lipofection using lipofectamine 3000 (Invitrogen cat. no. 
L3000001). Following sorting, TdTomato* cells were seeded at low 
density on gelatin-coated dishes (EMD Millipore cat. no. es-O06-b) 
in 2i medium (see below). Individual colonies were then transferred 
toa 96 well plate for expansion. Once ready to passage, the master 
plate was split onto 3 different 96-well plates. One plate was used 
for genotyping and the other two were frozen. Positive clones were 
then thawed, expanded and had their genotype confirmed by PCR 
and sequencing. Only one clone carrying the targeted homozygous 
insertion of 2A-Achilles-CL1-PEST-NLS in the Hes7 locus was found and 
further characterized. 

To generate the double-reporter line HES7-Achilles;MESP2-mCherry, 
we cotransfected the pGuide-it-tdTomato vector containing a single- 
guide RNA targeting the 3’ end of the MESP2 coding sequence (Extended 
Data Table 1), and a targeting vector composed of 1-kb homology 
arms flanking a 72A-H2B-mCherry sequence in the pUC19 backbone, 
in NCRM1 HES7-Achilles cells by nucleofection (Amaxa). We sorted, 
expanded, genotyped and sequenced individual clones. Three inde- 
pendent instances of successful homozygous insertion were found. 

To insert the constitutively expressed pCAG-H2B-mCherry reporter 
inthe safe harbour AAVS1 locus in NCRM1HES7-Achilles cells, we used a 
previously described approach”. In brief, we cloned the H2B-mCherry 
sequence into the pAAVS1-P-CAG-DEST vector (Addgene) by Gibson 
assembly and co-transfected it along with the pXAT2 vector (Addgene) 
into HES7-Achilles cells. Two days after nucleofection, we selected posi- 
tive clones by supplementing mTeSRI1 with puromycin (0.5 pg/ml, 
Sigma-Aldrich cat. no. P7255) for a total of 10 days. We obtained two 


positive clones and confirmed the homozygous insertion of H2B- 
mCherry by PCR. 


Mouse ES cell culture and 2D differentiation 

E14 mouse ES cells were maintained under feeder-free conditions in 
gelatin-coated dishes with 2i medium composed of high-glucose DMEM 
(Gibco cat. no. 11965-118) supplemented with 1% GlutaMAX (Gibco cat. 
no. 35050061), 1% non-essential amino acids (Gibco cat. no. 11140-050), 
1% sodium pyruvate (Gibco cat. no. 11360-070), 0.01% bovine serum 
albumin (BSA) (Gibco cat. no. 15260-037), 0.1% B-mercaptoethanol 
(Gibco cat. no. 21985-023), 15% fetal bovine serum (FBS) (EMD Millipore 
cat.no. ESOO9B), 1,000 U/mI LIF (EMD Millipore cat. no. ESG1106), 3 uM 
CHIR99021 (Sigma Aldrich cat. no. SML1046) and 1 uM PDO325901 
(Stemgent cat. no. 04-006). Mouse ES cells were passaged by TryplE 
(Gibco cat. no. 12605010) dissociation every 2 days at a density of 1 x 10* 
cells per square centimetre. ES cells were tested for mycoplasma con- 
tamination. We verified cell line identity by staining for pluripotency 
markers POUSF1 and SOX2. Paraxial mesoderm differentiation was 
carried out as previously described”°, with small modifications. Mouse 
ES cells were seeded at a density of 1 x 10* cells per square centimetre in 
fibronectin-coated dishes (BD Biosciences cat. no. 356008) in N2B27 
medium (StemCell Technologies cat. no. 07156 and 05731) supple- 
mented with 25 ng/ml activin A (R&D systems cat. no. 338-AC-050) and 
12 ng/ml bFGF (PeproTech cat. no. 450-33). After 48 hin culture, the 
differentiation medium was changed to high-glucose DMEM (Gibco 
cat. no. 11965-118) supplemented with 1% GlutaMAX (Gibco cat. no. 
35050061), 1% non-essential amino acids (Gibco cat. no. 11140-050), 
1% sodium pyruvate (Gibco cat. no. 11360-070), 0.01% BSA (Gibco cat. 
no. 15260-037), 0.1% §-mercaptoethanol (Gibco cat. no. 21985-023), 
15% FBS (EMD Millipore cat. no. ESOO9B), 3 uM CHIR99021 (Sigma 
Aldrich cat. no. SML1046) and 0.5 LM LDN193189 (Stemgent cat. no. 
04-0074). Cells were cultured for four additional days, and medium 
was changed daily. For live-imaging experiments, cells were seeded 
on 24-well glass-bottomed plates (In vitro Scientific cat. no. P24-1.5H- 
N) on day O and cultured in DMEM without phenol red (Gibco cat. no. 
31053028) from day 4 onwards. To extend the time spent in the oscilla- 
tory state, we additionally supplemented the differentiation medium 
with 50 ng/ml mouse FGF4 (R&D Systems cat. no. 5846-F4-025), 1pg/ml 
heparin (Sigma Aldrich cat. no. H3393-100KU), 2.5 UM BMS493 (Sigma 
Aldrich cat. no. B6688-5MG) and10 uM Y-27362 dihydrochloride (CLFBR 
medium‘) from day 4 onwards. 


Human iPS cell culture and 2D differentiation 

Human stem cell work was approved by Partners Human Research 
Committee (Protocol Number 2017P000438/PHS). We complied 
with all relevant ethical regulations. Written informed consent from 
the donor of the NCRM1iPS cells was obtained by Rutgers University 
at the time of sample collection. NCRM1 iPS cells (RUCDR, Rutgers 
University) and lines carrying the MSGNI-Venus”°, HES7-Achilles, 
HES7-Achilles;pCAG-H2B-mCherry and HES7-Achilles;MESP2-mCherry 
reporters were maintained in Matrigel-coated plates (Corning, cat. no. 
35277) in mTeSR1 medium (StemCell Technologies cat. no. 05851) as 
previously described’. All cell lines were tested for mycoplasma con- 
tamination. We verified cell line identity by staining for pluripotency 
markers POUSF1 and SOX2. Paraxial mesoderm differentiation was 
carried out as previously described’. In brief, mature iPS cell cultures 
were dissociated in Accutase (Corning cat. no. 25058CI) and seeded ata 
density of 3 x 10* cells per square centimetre on Matrigel-coated plates 
in mTeSR1 and 10 uM Y-27362 dihydrochloride (ROCKi; Tocris Biosci- 
ence, cat. no. 1254). Cells were cultured for 24-48 h until small, compact 
colonies were formed. Differentiation was initiated by switching to CL 
medium consisting of DMEM/F12 GlutaMAX (Gibco cat. no. 10565042) 
supplemented with 1% insulin-transferrin-selenium (ITS) (Gibco cat. no. 
41400045),3 uM Chir 99021 (Tocris cat. no. 4423) and 0.5 uMLDN193189 
(Stemgent cat. no. 04-0074). On day 3 of differentiation, cells were 
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changed to CLF medium consisting of CL medium with 20 ng/ml 
mouse bFGF (PeproTech cat. no. 450-33). Medium was changed daily. 

For live-imaging experiments, differentiation was performed as 
described, except cells were seeded on 35-mm matrigel-coated glass- 
bottomed dishes (MatTek cat. no. P35G-1.5-20-C) or 24-well glass- 
bottomed plates (In vitro Scientific cat. no. P24-1.5H-N). DMEM/F12 
without phenol red was used to reduce background fluorescence (Gibco 
cat. no. 21041025). 

To extend the oscillatory window of differentiated PSM cells, we 
cultured HES7-Achilles cells in CLFBR medium consisting of DMEM/ 
F12 GlutaMAX, 1% ITS, 3 uM Chir 99021, 0.5 1M LDN193189, 50 ng/ml 
mouse FGF4 (R&D Systems cat. no. 5846-F4-025), 1 pg/ml heparin 
(Sigma Aldrich cat. no. H3393-100KU), 2.5 uM BMS493 (Sigma Aldrich 
cat. no. B6688-5MG) and 10 uM Y-27362 dihydrochloride starting on 
day 2 of differentiation*. Medium was refreshed daily. 

To automatically track oscillations in individual cells within the cul- 
ture, we mixed HES7-Achilles;pCAG-H2B-mCherry cells with NCRMI1 cells 
in a ratio of 1:100 at the time of seeding for pre-differentiation. Cells 
were then differentiated normally under CLFBR conditions. 

To examine oscillations in isolated cells, we differentiated HES7- 
Achilles cells normally (CL medium) for the first 2 days on 35-mm plastic 
dishes and dissociated them with accutase (Corning cat. no. 25058CI) on 
day 2 of the differentiation protocol. Cells were reseeded on fibronec- 
tin-coated (BD Biosciences cat. no. 356008) or BSA-coated (Gibco cat. 
no. 15260-037) 24-well glass-bottomed plates at high (500,000 cells per 
well) or low density (25,000-50,000 cells per well) in CLFBR medium. 
Using our regular DMEM/F12 base medium resulted in poor survival of 
low-density cultures. We found that using RHB basal medium (Takara/ 
Clontech cat. no. Y40000), supplemented with 5% knockout serum 
replacement (KSR) (Thermo Fisher cat. no. 10828-028) improved sur- 
vival considerably. 


Explant culture 

Explant culture was performed as previously described*. LuVeLu 
CD1E9.5 mice (both male and female) were killed according to local 
regulations, consistent with national and international guidelines. 
We complied with all relevant ethical regulations. The study protocol 
was approved by Brigham and Women’s Hospital IACUC/CCM (pro- 
tocol number NO00478). Sample sizes were not estimated, nor were 
randomization or blinding performed. Tail buds were dissected with 
a tungsten needle and ectoderm was removed using accutase (Life 
Technologies). Explants were then cultured on fibronectin-coated 
plate (LabTek chamber). The medium consists of DMEM, 4.5g/I Glucose, 
2mM L-glutamine, non-essential amino acids 1x (Life Technologies), 
penicillin 100 U/ml, streptomycin 100 pg/ml, 15% FBS, Chir-990213 uM, 
LDN193189 200 nM, BMS-493 2.5 1M, mouse FGF4 50 ng/ml, heparin 
1 pg/ml, HEPES 10 mM and Y-27632 10 pM. Explants were incubated at 
37 °C, 7.5% CO,. Live imaging was performed onaconfocal microscope 
Zeiss LSM 780, using a 20x objective (note that the tiling could create 
lines between the different images). For micropattern culture, explants 
were cultured overnight in standard condition, then dissociated using 
trypsin-EDTA and plated on fibronectin-coated CYTOOchips Arenain 
a CYTOOchamber 4 wells. 


Small molecule inhibitor treatments 

To inhibit Notch signalling, 25 uM DAPT (Sigma Aldrich cat. no. D5942- 
5MG) was added to CLFBR medium on day 2 of differentiation. To inhibit 
FGF signalling, PD0325901 (Stemgent 04-006) or PD173074 (Cayman 
Chemical cat. no. 219580-11-7) were added to CL or CLFBR media at 
the indicated concentrations. WNT signalling was inhibited with the 
tankyrase inhibitors XAV939 (Sigma Aldrich cat. no. X3004) and IWR-1 
(Sigma Aldrich cat. no. 10161) at 2 1M and 12 uM, respectively, in CLFBR 
medium. Cell division was blocked by arresting cells at early S phase 
with5 uM aphidicolin (Sigma Aldrich cat. no. AO781) in CLFBR medium. 
Cells were pretreated for 24 h with aphidicolin before imaging (during 


day 2). The onset of imaging was thus delayed by one day and started 
only on day 3. Aphidicolin was maintained inthe medium throughout 
imaging. Latrunculin A (Cayman Chemical ca. no. 10010630), which 
inhibits actin polymerization and YAP signalling, was used at 350 nM 
in RHB basal medium supplemented with CLFBR and 5% KSR. Mouse 
explants and micropatterned cultures were treated with PDO325901 
(Sigma, at concentration as described in Extended Data Fig. 6) and 
PD173074 (Sigma, 250 nM). 


Time-lapse microscopy 

Time lapse-imaging of PSM cells was performed ona Zeiss LSM 780 
point-scanning confocal inverted microscope fitted with a large tem- 
perature incubation chamber and a CO, module. An Argon laser at 
514 nm and 7.5% power was used to excite the Achilles fluorophore 
through a 20x Plan Apo (N.A. 0.8) objective, anda DPSS 561 laser at 561 
nm and 2% laser power was used to excite mCherry samples. Images 
were acquired with an interval of 18 min in the case of human samples 
and 4.5 min for mouse samples, for a total of 24-48 h. A 3 x 3 tile of 
800 x 800 pixels per tile with a single z-slice of 18-1m thickness and 
12-bit resolution was acquired per position. Multiple positions, with at 
least two positions per sample, were imaged simultaneously using a 
motorized stage. Explant imaging was performed ona Zeiss LSM780 
microscope using a 20x/0.8 objective. For mouse cell imaging, asingle 
section (about 19.6-~m wide) with tiling (3 x 3) of a512 x 512-pixel field 
was acquired every 7.5 min (in most experiments) at 8-bit resolution. 


Immunostaining 
Forimmunostaining of 2D cultures, cells were grown on Matrigel-coated 
glass-bottomed plates or 12-mm glass coverslips placed inside plastic 
dishes or, alternatively, on 24 well glass-bottomed plates (In vitro Sci- 
entific cat. no. P24-1.5H-N). Cells were rinsed in Dulbecco’s phosphate 
buffered saline (DPBS) and fixed in a 4% paraformaldehyde solution 
(Electron Microscopy Sciences cat. no. 15710) for 20 min at room tem- 
perature, then washed 3 times with phosphate buffered saline (PBS). 
Typically, samples were permeabilized by washing 3 times for 3 min 
each in Tris buffered saline (TBS) with 0.1% Tween (TBST) and blocked 
for 1hat room temperature in TBS with 0.1% Triton and 3% FBS. Primary 
antibodies were diluted in blocking solution and incubated overnight 
at 4 °C with gentle rocking. Primary antibodies and dilution factors are 
listed in Extended Data Table 2. Following 3 TBST washes and a short 
10-min block, cells were incubated with Alexa-Fluor-conjugated second- 
ary antibodies (1:500) and Hoechst33342 (1:1,000) overnight at 4 °C 
with gentle rocking. Three final TBST washes and a PBS rinse were per- 
formed, and cells were mounted in fluoromount G (Southern Biotech 
cat. no. 0100-01). Images were acquired using either a Zeiss LSM880 
or LSM780 point scanning confocal microscope with a 20x objective. 
For visualizing dpERK1 and dpERK2in 2D monolayer differentiated 
cells, cells were transferred onto ice and quickly rinsed in ice-cold PBS 
containing 1mM sodium vanadate (NaVO,). Next, cells were fixed in 4% 
paraformaldehyde for 15 min at room temperature, rinsed 3 times in 
PBS and dehydrated in cold methanol at —20 °C for 10 min. Following 3 
PBS rinses, cells were blocked in PBS containing 0.1% Triton X-100 and 
5% goat serum and incubated in dpERK1and dpERK2 antibody diluted 
in antibody buffer (0.1% Triton X-100 and 1% BSA in PBS) overnight at 
4 °C. Cells were washed in PBS, and incubated in blocking solution for 
10 min and with secondary antibody and Hoechst33342 in antibody 
buffer overnight at 4 °C. Cells were rinsed three times in PBS before 
mounting and imaging as described in ‘Immunostaining’. 


RNA extraction, reverse transcription and qPCR 

Cells were collected in Trizol (Life Technologies cat. no. 15596-018), 
followed by precipitation with chloroform and ethanol and transferred 
onto Purelink RNA Micro Kit columns (Thermo Fisher cat. no. 12183016) 
according to manufacturer’s protocol, including on-column DNase 
treatment. A volume of 22 1] RNase-free water was used for elution 


and RNA concentration and quality were assessed with a Nanodrop. 
Typically, between 0.2 and 1 pg of RNA was reverse-transcribed using 
Superscript III First Strand Synthesis kit (Life Technologies cat. no. 
18080-051) and oligo-dT primers to generate cDNA libraries. 

For real-time quantitative PCR, cDNA was diluted 1:30 in water and 
qPCR was performed using the iTaq Universal SYBR Green kit (Bio-Rad 
cat. no. 1725124). Each gene-specific primer and sample mix was run 
in triple replicates. Each 10-pl reaction contained 5 pl 2x SYBR Green 
Master Mix, 0.4 pl of 10 uM primer stock (1:1 mix of forward and reverse 
primers), and 4.6 pl of diluted CDNA. qPCR plates were run ona Bio-Rad 
CFX384 thermocycler with the following cycling parameters: initial 
denaturation step (95 °C for 1 min), 40 cycles of amplification and 
SYBR green signal detection (denaturation at 95 °C for 5s, annealing, 
extension and plate-read at 60 °C for 40s), followed by final rounds of 
gradient annealing from 65 °C to 95 °C to generate dissociation curves. 
Primer sequences are listed in Extended Data Table 3. All unpublished 
primers were validated by checking for specificity (single peak in 
melting curve) and linearity of amplification (serially diluted cDNA 
samples). For relative gene expression analysis, the AAC, method was 
implemented with the CFX Manager software. PPIA was used as the 
housekeeping gene in human iPS cell samples, and Actb was used in 
mouse ES cell samples. Target gene expression is expressed as fold 
change relative to undifferentiated human iPS or mouse ES cells. 


Flowcytometry analysis 

To determine the fraction of PSM cells that express pMsgn1-Venus or 
MSGN1-Venus, cultures were dissociated in Accutase and analysed by 
flow cytometry using an S3 cell sorter (Biorad). Undifferentiated ES or 
iPS cells, which do not express the fluorescent protein, were used as 
anegative control for gating purposes. Samples were analysed in bio- 
logical triplicates. Results are presented as the percentage of Venus* 
cells in the sorted fraction. 


ChIP-qPCR 

Binding of NOTCHI1 to the promoters of ACTB, LNFG and HES7 was ana- 
lysed by ChIP. Cells were crosslinked for 30 min using ChIP Cross-link 
Gold reagent (Diagenode, CO1019027), rinsed with PBS and then 1% for- 
maldehyde for 15 min. After quenching with 125 uM glycine and rinsing 
the crosslinked cells with ice-cold PBS, cells were collected using a cell 
scraper. Cell lysis and pulldown of chromatin with A/G-protein-coated 
magnetic beads was performed on approximately 300,000 cells per 
immunoprecipitation using MAGnify ChIP kit (ThermoFisher cat. no. 
492024) following manufacturer’s instructions. Chromatin fragmen- 
tation was performed using a Covaris M220 sonicator for 5 min (75 W 
PIP, 5% DF and 200 cycles per burst). NOTCH1immunoprecipitation 
was performed using 3.3 pg of anti-NOTCHI (D1E11, 3608S Cell Signal- 
ing) per immunoprecipitation. This antibody binds the transactiva- 
tion domain of NOTCHI1 and has previously been successfully used 
for ChIP-seq applications”’. Half a microgram of anti-acetyl-histone 
H3 (Lys9) (C5B11, 9649S Cell Signal) was used. Fold enrichment (2°) 
was calculated relative to isotype IgG controls, immunoprecipitated 
with 3.3 pg of normal rabbit IgG (2729S Cell Signal). Enriched loci after 
ChIP were interrogated by qPCR using primers designed to amplify 
approximately 100 bp surrounding previously identified RBPJ binding 
sites in the HES7 and LFNG promoters”. 


Image analysis 

Time-lapse movies of HES7-Achilles were first stitched and sepa- 
rated into subsets by position in the Zen program (Zeiss). Then, back- 
ground subtraction and Gaussian blur filtering were performed in 
Fiji? to enhance image quality. When single cell tracking was not per- 
formed, a small region of interest (ROI) was drawn and the mean fluo- 
rescence intensity over time was calculated. Intensity is presented in 
arbitrary units. When appropriate, the moving average was subtracted 
with window size of 3 h for human PSM (that is, 10 time points) and 


mouse PSM (that is, 40 time points), and then normalized between 
0 and 1. For smoothening, we applied the Sgolay filtering function in 
MATLAB. 

Following moving average subtraction, we performed Fourier 
transformation of HES7-Achilles intensity profiles to determine 
the predominant period of oscillations. The Hilbert transformation 
was used to calculate the instantaneous frequency and phase of HES7- 
Achilles oscillations from ROIs. To compare the phase between ROIsin 
DMSO- and PD17- or PDO3-treated cultures, we used the Hilbert trans- 
formation to calculate the instantaneous phase of each curve sepa- 
rately, and then subtracted the phase of treated cells from untreated 
cells at each time point. Phase difference is expressed as the average 
of instantaneous phase differences before the arrest of oscillations 
in treated cells. 

To manually track oscillations in PSM cells derived from mouse ES 
cells as well as isolated or sparse human HES7-Achilles cells ina NCRM1 
background, we tracked cells by drawing a circle around the nucleus 
of an individual cell at each time point and measuring fluorescence 
intensity inside the ROI. To remove saturated pixels corresponding to 
autofluorescent debris in mouse ES cell PSM movies, we set pixels with 
intensity >700 AU (above the dynamical range of Hes7-Achilles) to the 
background level (100 AU) in MATLAB. Inthe case of MESP2-mCherry, 
we established a threshold for activation (25 AU) by taking the mean 
of several ROIs representing the background noise. 

For mouse explants, kymographs were done in Fiji? by drawing a 
rectangle from the starting centre of the travelling waves to the edge 
of the explant perpendicular to the direction of the wave. The intensity 
along the long axis was measured and the image was smoothened (this 
filter replaces each pixel with the average of its 3 x 3 neighbourhood). 

Fluorescence intensity profiles were done by selecting a circular 
region of interest in Fiji? and by measuring the total intensity over time 
for this region; LuVeLu intensity is given in arbitrary units (normal- 
ized by the initial value) and a smoothing function (average over three 
points) was applied. Fluorescence intensity shows the mean fluores- 
cence smoothed by applying a moving average over five points (with 
equal weight). For the quantification of micropattern experiments, 
a ROI encompassing the entire surface of one circle was drawn and 
the LuVeLu intensity was measured using the Time Series Analyzer 
V3 plugin on Fiji. The period was measured by measuring the time 
between two peaks or two troughs. The average intensity was measured 
by averaging the intensity over 3 h to avoid instantaneous variations 
owing to the oscillations. 


Automatic image segmentation and cell tracking 

Cells were automatically segmented and tracked on the microscopy 
movies using a custom algorithm. To this end, we first identified and 
listed the cell positions and cell shapes using a detection of the con- 
nected components of athresholded image applied to the pCAG-H2B- 
mCherry channel (using the bwconncomp MATLAB algorithm). For 
reliability, we used a minimal de-noising based on morphological 
operations (imopen then imclose functions of MATLAB, both with 
radius of 1 pixel). The shape of the cell was used to detect the level of 
expression of HES7-Achilles by considering the average HES7-Achil- 
les level within the connected component detected in the pCAG-H2B- 
mCherry channel. This provides us witha list of cell positions together 
with the associated average HES7-Achilles intensity, for each frame of 
the microscopy movie. 

Tracks were then reconstructed consecutively by finding, given a 
cellin frame k, the closest cellin frame k+1 within a distance of 20 pm, 
consistent with the typical movement of a cell between two frames, 
and not too large (to avoid switching tracks). This provided us with cell 
tracks—the trajectories of the cells in the microscopy field. By match- 
ing these tracks with the recorded HES7-Achilles intensity, we thus 
obtained HES7-Achilles activity as a function of time for each single 
cell tracked by the algorithm. 
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Phase analysis 

Whereas collective oscillations appear very regular, HES7-Achilles 
expression in single cells shows heterogeneous profiles and fluctuat- 
ing background fluorescence intensity (Supplementary Video 5), and 
phase detection requires specific attention™. To derive accurately a 
phase of oscillation for a single cell, we used acustom method based on 
Hilbert transform (method 1), and two control methods that provided 
very similar results (methods 2 and 3). We relied on method 1 for the 
main figures, as this method provided an accurate estimate even during 
the first periods of the oscillations. 


Hilbert transform (method 1). The Hilbert transform is a functional 
transform of time series, the argument of which provides an efficient 
estimate of the phase of a signal (and its modulus, the envelope am- 
plitude). Hilbert transforms are sensitive to drifts in the signals and 
changes in the shape of the oscillation. Classically, Hilbert transform 
follows a detrending preprocessing based on removing alinear drift. To 
improve the evaluation of the phase using the Hilbert transform inthe 
present case (in which where drifts are nonlinear and amplitudes vary 
intime), we used a local renormalization algorithm, similar to a previ- 
ously published algorithm™, consisting of (i) centring the signal locally 
using a moving average computed over a time window of 6 h around 
the current time point (MATLAB function movmean, 6h providing a 
duration slightly longer than the period of the average signal), enabling 
correcting for local changes in the average signal, and (ii) normalizing 
the amplitude dividing the centred signal by a sliding standard devia- 
tion, computed onthe same window of 6h (MATLAB function movstd). 
We then evaluated the phase using the hilbert function of MATLAB. 


Cross-correlations (method 2). We also developed a methodology 
for evaluating phase shifts between two signals (S1(t) and S2(t)) based 
ona local cross-correlation estimate. In detail, at a given time ¢, the 
algorithm finds the delay dt between O and 4h, maximizing the cor- 
relation between the chunk of signal S1(s) and S2(s + dt) over the time 
interval s € [t,t + 6 h]. We developed this algorithm using a custom 
MATLAB code and used this algorithm to compute phase differences 
between pairs of cells. 


Method 3. A third method used for control was developed on the basis 
of detecting peaks of the signals. In detail, we detected the times at 
which the signal peaks using the findpeaks function of MATLAB. When 
peaks are detected at times fo, t, ..., t,, the phase of the signal at a given 
time te [t,, ¢;,,] was defined as the relative fraction of time between the 
two consecutive peaks, 


The findpeaks function was also used to count the number of oscilla- 
tions before arrest at the single-cell level. 


Synchronization 

To quantify the level of synchrony between the HES7-Achilles expres- 
sion in multiple cells, we first selected tracks that were followed for 
multiple periods of oscillations. We used minimal duration of 15 hand 
Fourier transform larger than a lower threshold; the selection using 
Fourier transform did not significantly modify the statistics. Next, 
we computed the Kuramoto order parameter (also known as vector 
strength””*>) of a given set of signals phases. Considering n signals 
with phases @,, ..., 6,,, the Kuramoto order parameter Zis defined by 
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in which iis the complex variable. This provides a complex number, 
the angle of which corresponds to the average phase and the modulus 
(norm) of which quantifies the level of synchrony. The modulus of Z 
is indeed equal to 1 when all oscillators have the same phase (in which 
case Z=e", in which 8 is the common phase of all oscillators), and it is 
equal to O when the phases are uniformly spread between O and 27. 
For uniformly distributed phases with standard deviation equal too, 
the amplitude of the Kuramoto order parameter is equal to sin(o)/o,a 
function smoothly decaying from 1to 0 as ogoes from Oto 7. 

Using the phases we derived for each track, we evaluated as a func- 
tion of time the order parameter and its modulus. Because of natural 
experimental fluctuations and the finite number of cells considered, 
asynchronous cells are characterized by a low—but non-zero—Kura- 
moto order parameter. To assess whether the observed Kuramoto order 
parameter was statistically consistent with synchrony, we evaluated 
what the level of Kuramoto order parameter norm would be for asyn- 
chronous sets of cells. To this end, we used our evaluated phases 6,(0), 
..., 8,(0 and constructed multiple surrogate datasets by shuffling the 
phase relationships between those trajectories, but preserving their 
intrinsic frequency of oscillations. To this end, we drew time-shifts 
uniformly in [0,7], in which Tis the total time considered for the phases, 
for each cell. This yields n times 1, ..., T,, from which we derived the 
Kuramoto order parameter for a set of phases 6,(¢ + T,), ..., 8,(€+T,), 
wrapped on the interval [0,7], that is, the times ¢+ T,are taken modulo 
T, and computed the associated order parameter. We repeated this 
randomization 1,000 times and obtained a stable distribution of the 
Kuramoto order parameter for phases with no specific phase relation- 
ship. This provided a level of Kuramoto order parameter consistent 
with asynchrony. We then tested whether the order parameter found 
for the original data was consistent with synchrony by comparing this 
value to the distribution of surrogate order parameters. 


Spatiotemporal wave 

To assess whether the data were organized into a spatiotemporal wave 
pattern, we used our extensive dataset containing both the instan- 
taneous positions and instantaneous phases for the cells that were 
detected by our automatic segmentation and tracking algorithm. For 
each pair of cells, we computed their instantaneous (physical) distance 
as well as their phase shift. This provided us with a very large dataset, 
which we organized according to ranges of distances, chosen so that 
each set contained approximately the same number of cell pairs. We 
used distances of less than 160 ppm, between 160 and 265 pm, between 
265 pm and 530 pm and larger than 530 pm; the number of cells at a 
distance larger than 530 um was not kept equal to the other numbers 
to keep sufficient resolution. We then plotted the distribution of phase 
shifts for each distance class, and used the two-sample Kolmogorov- 
Smirnov test (MATLAB function kstest2) to compare these distributions 
two-by-two, accounting for the classical sample-size bias of the test 
by selecting large subsets of equal size for each distance class*’, and 
obtained a P value for whether the two samples were drawn from the 
same distribution. We consistently found that the distribution of phase 
shifts was not dependent on the distance between cells. 


Diffusion coefficient 

To characterize cellular movement from automated cell tracks and 
test the hypothesis that the movement of the cells was consistent with 
freely diffusing particles (Brownian motion), we computed the mean 
square displacement of each cell in an automatically identified track in 
agiven time lag. In detail, the mean square displacement is defined by: 
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inwhich kis a tracked cell label, tis time and the angular brackets indi- 
cate that an average on all possible values of tare taken (thatis, iftrack 
k lasts up to time 7;, the average is taken for e{1, ..., 7, — T}). Freely 


moving cells with diffusivity D should have a linear mean squared dis- 
placement AK = 4Dr. By fitting a linear curve to the mean square dis- 
placement for all cells, we obtained an estimate for Das well as aPvalue 
for assessing the validity of the linear fit (ANOVA). 


Period of oscillations 

The period of oscillations in automatically tracked cells was computed 
using fast Fourier transform (MATLAB function fft) of the centred 
HES7-Achilles expression for each cell tracked (the centring con- 
sisted only of removing the mean value of the signal in time). Peaks of 
the Fourier transforms were identified using the findpeaks MATLAB 
function, and the most prominent peak was used to compute the 
period of the signal. To confirm this estimate of the period, we used 
an alternative method based on identifying the peaks in HES7-Achilles 
expression for each cell and computing the difference between the 
times of the peaks. We found a very good agreement between the two 
methods. 


Phase shifts 

Toassess the relative phase shift between two samples at the single-cell 
level (for example, control versus PDO3), we first obtained the phases 
as a function of time for each automatically tracked cell as described 
in ‘Phase analysis’. We then calculated the phase difference between 
all possible pairs of cells between the two samples at all time points, 
and displayed these data in histograms. We additionally computed 
the mean phase shift across all time points for all pairs of cells and 
the corresponding s.d. To compare the phase shift between different 
pairs of samples, we used non-parametric one-way ANOVA with the 
Kruskal-Wallis test. 


Cell division analysis 

Our automated cell tracking algorithm (described in ‘Automatic image 
segmentation and cell tracking’) did not detect cell division, but rather 
selected one daughter cell at random and continued tracking without 
interruption. Thus, we resorted to manual tracking for the detection 
of cell division. We used the Fiji*? plugin ManualTracks and recorded 
the time points at which cells underwent mitosis. Manual tracking was 
performed on the pCAG-H2B-mCherry channel, such that chromatin 
compaction during cell division was clearly identifiable and tracks 
were completely independent from HES7-Achilles intensity. Cell divi- 
sion time was defined as the time that elapsed between the time a cell 
first divides and the time that one of its daughter cells divides again. 
Once cell division events were manually identified, we used an auto- 
matic tracking to recover the tracks before and after cell division. In 
detail, given a cell division event at time t and at a given location of 
the field, we identified in our automatically identified cell the closest 
match. When the distance between the automatically and manually 
identified cells was small enough (here, belowa distance of 21 um), we 
recovered the HES7-Achilles expression from the associated already 
identified track. If there was no cell identified near the manually iden- 
tified location (in rare cases, manually identified dividing cells had 
not been detected by the algorithm), we used locally a version of the 
automatic tracking algorithm (in a sub-image of 5.3 x 5.3 um) to derive 
acell location and an associated HES7-Achilles expression. These data 
were then processed exactly as the automatically identified tracks, 
and we obtained the phases of the oscillations of the dividing cells. 
We then built the histogram of the phases at cell division, and used the 
one-sample Kolmogorov-Smirnov test to assess whether the distribu- 
tion of phases was consistent with a uniform distribution, indicating 
nocorrelation between phase in the HES7-Achilles expression and cell 
division. To this end, we used the makedist MATLAB function to create a 
uniform distribution and used the kstest MATLAB function to compare 
our sample of phases at cell division with a uniform distribution. This 
provided atest of hypothesis together with the Pvalue indicated inthe 
legend of Extended data Fig. 5j. 


Statistical analyses 

In box-and-whiskers plots, the middle hinge corresponds to median, 
lower and upper hinges correspond to the first and third quartiles, 
respectively, and the lower and upper whiskers correspond to the mini- 
mum and maximum, respectively. Ordinary one-way ANOVA was per- 
formed in cases in which data were Gaussian, and Tukey or Bonferroni 
correction was used for multiple comparisons. In cases in which data 
were not Gaussian (for example, phase shifts), we used anon-parametric 
one-way ANOVA with the Kruskal-Wallis test. For time series, such as 
the Kuramoto order parameter over time, we used paired ANOVA with 
matched time points. Details of statistical analyses are indicated in 
the figure legends. All differentiation experiments were performed a 
minimum of three independent times (rounds of differentiation), each 
containing at least three technical replicates (wells) per condition. 


Preparation of single-cell suspensions for sCRNA-seq 

Single-cell dissociation protocols for the various tissues and cells ana- 
lysed were optimized to achieve >90% viability and minimize doublets 
before sample collection. For human iPS differentiation, 3 x 10* MSGNI- 
Venus cells were seeded on Matrigel-coated 24-well plates 48 h before 
differentiation. Cells were differentiated as described in ‘Human iPS 
cell culture and 2D differentiation’. All samples (days 1-4 and human 
iPS cell control samples) were dissociated, collected and captured on 
an inDrops setup on the same day, two biological replicates per sam- 
ple. For dissociation, cells were briefly rinsed in PBS, and incubated in 
TrypLE Express (Gibco) for 5 min at 37 °C. Dissociated cells were run 
through a30-um cell strainer, spun down at 200g for 4 min at 4 °C and 
resuspended in 100 pl 0.5% BSA in PBS. 

For mouse ES cell differentiation, 1 x 10* pMsgn1-Venus cells were 
seeded on fibronectin-coated 6-well plates and differentiated as 
described in ‘Mouse ES cell culture and 2D differentiation’. Samples 
for day O and days 2-5 were dissociated in TryplE Express (Gibco) for 
3-10 min, washed several times in PBS, passed through a 40-um cell 
strainer and resuspended in 0.1% BSA in PBS with Opti-Prep at a final 
density of 200,000 cells per millilitre. All samples were dissociated, 
collected and captured on the same day in biological duplicates. 

For generating cell suspensions from mouse embryo tail buds, E9.5 
embryos (25-28 somite stage) from CD-1IGS mice (Charles River) were 
collected and the posterior part of the embryo, including the three 
most recently formed pairs of somites, was carefully dissected from 2 
littermate embryos and subsequently processed as separate samples. 
Tissues were collected in PBS and dissociated in TrypLE Express for 
10 min at 37 °C. Cells were rinsed in PBS and EDTA, transferred to 0.5% 
BSA in PBS, mechanically separated by trituration and run through a 
30-um cell strainer. Cells were spun down at 200g for 4 min at 4 °C and 
resuspended in 100 pl 0.5% BSA in PBS. 

The following numbers of cells were sequenced per sample: (1) 
human iPS cell differentiation samples (two biological replicates pro- 
cessed independently). For each replicate human iPS cell control, 1,000 
cells; day 1, 1,500 cells; day 2, 1,500 cells; day 3, 1,500 cells; and day 4, 
1,500 cells. (2) Mouse ES cell differentiation samples. ES cell day 0, 
2,341 cells; day 2, 2,417 cells; day 3, rep. 13,106 cells; rep. 23,189 cells; 
day 4: rep. 1, 2,939 cells; rep. 22,532 cells; day 5: rep. 1, 1,894 cells; rep. 
2, 3,060 cells. (3) Mouse embryo samples: tail-bud cells from two E9.5 
embryos (2x 3,000 cells processed independently). 

Every sample was collected as biological replicate and sequencing 
data from both samples were combined for data analysis. The actual 
number of cells captured on inDrops was twice as many as sequenced, 
for backup purposes. 


Barcoding, sequencing and mapping of single-cell 
transcriptomes 

Single-cell transcriptomes were barcoded using inDrops” as previously 
reported”, using V3 sequencing adapters. Following within-droplet 
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reverse transcription, emulsions consisting of about 1,000-3,500 cells 
were broken, frozen at —80 °C, and prepared as individual RNA-seq 
libraries. inDrops libraries were sequenced on an Illumina NextSeq 
500 using the NextSeq 75 High Output Kits using standard Illumina 
sequencing primers and 61 cycles for read 1 and 14 cycles for read 2, 
8 cycles each for index read 1 and index read 2. Raw sequencing data 
(FASTQ files) were processed using the inDrops.py bioinformatics pipe- 
line available at https://github.com/indrops/indrops. Transcriptome 
libraries were mapped to human or mouse reference transcriptomes 
built from the GRCh37/hg19 (GCF_000001405.13) or GRCm38/mm10 
(GCF_000001635.20) genome assemblies, respectively. Bowtie version 
1.1.1 was used with parameter —e 200. 


Processing of scRNA-seq data 

Single-cell counts matrices were processed and analysed using ScanPy*® 
(1.4.3) and custom Python scripts (Code Availability). Low-complexity 
cell barcodes, which can arise from droplets that lack a cell but contain 
background RNA, were filtered in two ways. First, inDrops data were 
initially filtered to only include transcript counts originating from 
abundantly sampled cell barcodes. This determination was performed 
by inspecting a weighted histogram of unique molecular identifier— 
gene pair counts for each cell barcode, and manually thresholding to 
include the largest mode of the distribution (in all cases >80% of total 
sequencing reads). Second, low-complexity transcriptomes were fil- 
tered out by excluding cell barcodes associated with <250 expressed 
genes. Transcript unique molecular identifier counts for each biologi- 
cal sample were then reported as atranscript x celltable, adjusted bya 
total-count normalization, log-normalized, and scaled to unit variance 
and zero mean. Unless otherwise noted, each dataset was subset to the 
2,000 most highly variable genes, as determined by a bin-normalized 
overdispersion metric. Mouse E9.5 data were filtered for doublet-like 
cells with Scrublet®’, which simulates synthetic doublets from pairs of 
scRNA-seq profiles and assigns scores based ona k-NN classifier onthe 
data transformed by principal component analysis (PCA). 


Low-dimensional embedding and clustering 

Unless otherwise stated, processed single-cell data were projected into 
a50-dimensional PCA subspace. The mouse E9.5 PSM (k= 20) nearest- 
neighbour graph used Euclidean distance and 20 PCA dimensions. The 
mouse ES cell and human iPS cell neighbour graphs were constructed 
using the batch-balanced bbknn method”. Clustering was performed 
using Louvain“ and Leiden? community detection algorithms. 


Identification of differentially expressed genes 

Transcripts with significant cluster-specific enrichment were identi- 
fied by a two-sided Wilcoxon rank-sum test comparing cells of each 
cluster to cells from all other clusters in the same dataset. Genes were 
considered differentially expressed if they met the following criteria: 
log-transformed fold change > O, adjusted P value < 0.05. False discov- 
ery rate (FDR) correction for multiple hypothesis testing was performed 
as described, by Benjamini-Hochberg”®. The top 100 differentially 
expressed genes, ranked by FDR-adjusted P values, associated fold 
changes, and sample sizes (number of cells per cluster) are reported 
in Supplementary Table 1. Gene names for the top 20 differentially 
expressed transcripts are reported in Extended Data Figs. 2d (mouse 
E9.5), 3c (mouse E9.5 PSM), h (mouse ES cell) and m (human iPS cell). 


Pseudo-spatiotemporal ordering and identification of 
dynamically varying genes 

Pseudo-spatiotemporal orderings were constructed by randomly 
selecting a root cell from the following clusters: neuromesodermal 
progenitor (mouse E9.5 PSM, Fig. 2a); day O ES cell (mouse ES cell, 
Fig. 2c); day OiPS cell (human iPS cell, Fig. 2e) and calculating the dif- 
fusion pseudotime distance of all remaining cells relative to the root. 
Trajectories were assembled for paths through specified clusters, with 


cells ordered by diffusion pseudotime values, as previously reported™. 
Dynamically variable genes along the mouse E9.5 PSM trajectory were 
identified as follows. In brief, sliding windows of 100 cells were first 
scanned to identify the 2 windows with maximum and minimum aver- 
age expression levels for all genes individually. For each gene, a t-test 
was then performed between these 2 sets of 100 expression measure- 
ments (FDR < 0.01). Scaled expression values for significant genes 
were then smoothened over a sliding window of 100 cells, ranked by 
peak expression and plotted as a heat map, shown in Fig. 2c. The full 
list of dynamically expressed genes appears in Supplementary Table 2. 


Machine-learning classification of cell states 

Cell state prediction used the KNeighboursClassifier, RandomForest- 
Classifier, LinearDiscriminantAnalysis (LDA), and MLPClassifier (Neu- 
ralNetwork) classifier methods from scikit-learn (0.20.3). Classifiers 
were trained onthe full Louvain cluster-annotated PCA subspace-pro- 
jected mouse E9.5 dataset (n = 4,367 cells) with default settings and 
k=20 for KNeighboursClassifier. Mouse ES and human iPS cell states 
were predicted after subsetting matching gene symbols for the E9.5 
variable gene list, and projecting into the E9.5-defined PCA subspace. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


High-throughput sequencing data, raw sequencing data, raw and 
normalized count data, and single-cell clustering assignments gener- 
ated in this study have been deposited are available from NCBI Gene 
Expression Omnibus (GEO) accession number GSE114186, and can be 
visualized at https://tinyurl.com/DiazPourquie2019. Source Data cor- 
responding to the following figure panels are available with the paper: 
Fig. 1b-f, 2a—h, 3a—d, f,h, 4b-i and Extended Data Figs. 1c, d, f-k, m-s, 
2a-c, 3a, b, d-g, i-I,n, 0, 4a—d, 5b, c, g-w, 6b, c, f-q, s—u, 7b-d. Online 
interactive versions and downloadable versions of the analysed scRNA- 
seq datasets, as well as scRNA-seq transcript x cell count tables can be 
accessed at https://tinyurl.com/DiazPourquie2019, as follows. Mouse 
E9.5 t-distributed stochastic neighbour-embedding (¢-SNE) clustering 
analysis (Extended Data Fig. 2c) data are available from https://tinyurl. 
com/DiazPourquie2019-mE95. The mouse E9.5 k-NN graph of paraxial 
mesoderm and neural clusters (Fig. 2a, b, Extended Data Fig. 3a—e) is 
available from: https://tinyurl.com/DiazPourquie2019-mE95-PSM. 
Data related to mouse ES cell cultures from day O to day 5 (Fig. 2c, 
d, Extended Data Figs. 3f-j) are available from: https://tinyurl.com/ 
DiazPourquie2019-mESC. Data related to human iPS cell cultures from 
day O to day 4 (Fig. 2e, f, Extended Data Figs. 3k-o) are available from: 
https://tinyurl.com/DiazPourquie2019-hIPSC. Additional data, such 
araw image files, are available from the corresponding author upon 
request; all materials used in this study—including stem cell lines carry- 
ing knock-in reporters—are available by request from the correspond- 
ing author. 


Code availability 


Single-cell sequencing data were processed and analysed using publicly 
available software packages: https://github.com/indrops/indrops and 
https://github.com/AllonKleinLab/SPRING. Downstream analysis was 
performed in ScanPy”® (1.4.3), using Python 3.6.8. Python code and 
Jupyter notebooks for reproducing single-cell analyses appearing in 
Fig. 2 and Extended Data Figs 2-4 are available at https://github.com/ 
wagnerde/Diaz2019. This Github link also includes detailed instruc- 
tions for installing the necessary Python software environment, includ- 
ing the following packages and their dependencies: anndata(0.6.22. 
postl1), bbknn(1.3.6), fa2(0.3.5), ipython(7.8.0), jupyterlab(1.1.4), 


leidenalg(0.7.0), louvain(0.6.1), matplotlib(3.0.3), multicoretsne(0.1), 
numba(0.45.1), numpy(1.17.2), pandas(0.25.1), pytables(3.5.2), 
python(3.6.7), python-igraph(0.7.1.post7), scanpy(1.4.4.post1), scikit- 
learn(0.21.3), scipy(1.3.1), scrublet(0.2.1), seaborn(0.9.0), statsmod- 
els(0.10.1) and umap-learn(0.3.10). Force-directed layouts of single-cell 
graphs were generated using the ForceAtlas2 algorithm in Gephi (0.9.1). 
MATLAB code used for single-cell tracking and synchronization anal- 
ysis is available at: https://github.com/jonathan-touboul-brandeis/ 
HumanSegmentationClock. 
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Extended Data Fig. 1| See next page for caption. 


Extended Data Fig. 1| Differentiation of mouse and human pluripotent stem 
cells towards PSM fate for the characterization of the segmentation clock 
in vitro. a, Scheme illustrating the maturation stages of paraxial mesoderm. 
DF, determination front; pPSM. b, Top, immunofluorescence staining for the 
cadherins CDH1 and CDH2 (top), and the pluripotency factor POUSF1 (bottom) 
in differentiating mouse ES cells (ESCs) (left) and human iPS cells (iPSCs) 
(right). n=4 independent experiments. Scale bar, 100 um.c, (RT-PCR for the 
epiblast marker Fgf5, the neuromesodermal progenitor or mesodermal marker 
T, and the mesodermal precursor cell and PSM markers Tbx6, Msgn1 and Rspo3 
on days 2-6 of mouse ES cell differentiation. Relative expression is shown as the 
fold change relative to ES cells at day 0. Mean+s.d.n=3 biological replicates. 
d, Percentage induction of the mouse (m)ES cell pMsgni1-Venus reporter and the 
human (h)iPS cell MSGNI-Venus reporter, as determined by fluorescence- 
activated cell sorting (FACS). Mean+s.d.n=12 independent experiments 
(mouse ES cell), n=8 independent experiments (human iPS cell). e, Gating 
strategy and representative FACS plots for quantification of pMsgn1-Venus or 
MSGN1-Venus induction. f, (RT-PCR for cyclic genes (HES7 and LFNG), 
posterior-PSM markers (MSGN1, TBX6 and RSPO3), determination-front 
markers (MESP2 and RIPPLY2) and anterior-PSM markers (MEST and FOXC2) on 
days 1-4 of human iPS cell (iPSC) differentiation. Relative expression is shown 
as the fold change relative to iPS cells at day 0. Mean+s.d.n=3 biological 
replicates. g, Diagram outlining the targeting strategy used to generate Hes7- 
Achilles and HES7-Achilles knock-in reporter lines in mouse ES cells and human 
iPS cells, respectively. h, Normalized HES7-Achilles fluorescence intensity for 
three PSM cells derived from mouse ES cells, imaged in CL medium on day 4 of 
differentiation. n=4 independent experiments. i, Representative Fourier 
transform of HES7-Achilles oscillations in PSM cells derived from mouse ES 
cells, indicating the predominant period. n=19 cells.j, Total time spent inthe 
oscillatory state for Hes7-Achilles PSM cells derived from mouse ES cells, 
cultured in CL or CLFBR medium from day 4 onwards. The middle hinge 
corresponds to median, the lower and upper hinges correspond tothe first and 


third quartiles, respectively, and the lower and upper whiskers correspond to 
the minimum and maximum, respectively. n=8 (CL), n=12 

(CLFBR) independent experiments. k, (RT-PCR comparing relative expression 
levels of Msgn1, Lfng, Tand Tbx6in PSM cells derived from mouse ES cells, 
cultured in CL or CLFBR medium from day 4 onwards. Relative expression is 
shownas the fold change relative to ES cells at day 0. Mean+s.d.n=3 biological 
replicates. I, Snapshots of HES7-Achilles fluorescence in PSM cells derived 
from human iPS cells, showing peaks and troughs over the course of 13.5 hin CL 
medium on day 2 of differentiation. n=25 independent experiments. Scale bar, 
100 pm. m, Representative quantification of HES7—-Achilles fluorescence 
intensity ina small ROI from day 2 to day 3 of human iPS cell differentiation. 
n=25 independent experiments. n, Representative Fourier transform of HES7- 
Achilles oscillations, indicating the predominant period in PSM cells derived 
from human iPS cells, in CL medium on day 2.n=25 independent experiments. 
o, Representative instantaneous frequency in Hertz (calculated by Hilbert 
transformation) of HES7-Achilles oscillations in PSM cells derived from human 
iPS cells, from day 2 to day 3 of differentiation in CL medium. 

n=25 independent experiments. p, Representative instantaneous frequency in 
Hertz (calculated by Hilbert transformation) of HES7-Achilles oscillations in 
PSM cells derived from human iPS cells, from day 2 to day 3 of differentiationin 
CLFBR medium. n= 33 independent experiments. q, Quantification of HES7— 
Achilles fluorescence in human iPS cells differentiated for 48 h without the 
BMP inhibitor LDN93189 (CHIR99021-only medium). n=3 independent 
experiments. r, Total number of HES7-Achilles oscillations for PSM cells derived 
from human iPS cells, cultured in CL or CLFBR medium from day 2 onwards. 
Mean +s.d.n=15 independent experiments. s, qRT-PCR comparing relative 
expression levels of HES7, LFNG, TBX6 and MSGN1 in PSM cells derived from 
human iPS cells, cultured in CL or CLFBR medium from day 2 onwards. Relative 
expression is shownas the fold change relative to iPS cells on day 0. Mean+s.d. 
n=3 biological replicates. 
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Extended Data Fig. 2 | scRNA-seq analysis of the mouse E9.5 embryonic tail 
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extent to whicha given single-cell transcriptome resembles a linear 
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of doublet scores. Scores >0.24 were filtered from subsequent analyses. 


c, SNE embedding of E9.5 cells (n= 4,367) post-doublet filtering. Individual 
cells are coloured according to annotated Louvain cluster identities. d, Top 20 
positively enriched transcripts for each Louvain cluster relative to all other 
clusters, as detected by atwo-sided Wilcoxon rank-sum test. Reported 
transcripts are ranked by FDR-corrected P values (Benjamini-Hochberg). Exact 


sample sizes are givenin Supplementary Table 1. 
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Extended Data Fig. 3 | Comparative analysis of PSM differentiation 
trajectories in vitro and in vivo. a, f,k, ForceAtlas2 layouts of mouse E9.5 
embryos, mouse ES cell and human iPS cell single-cell K-NN graphs, coloured by 
cluster identity and collection time points, as indicated. b, g,1, Confusion 
matrices plot the overlap of cluster and time-point assignments, row- 
normalized.c,h, m, Top 20 positively enriched transcripts for Louvain clusters 
relative to all other clusters in each dataset, as detected by atwo-sided 


Seto am 


Wilcoxon rank-sum test. Reported transcripts are ranked by FDR-corrected 
Pvalues (Benjamini-Hochberg). Exact sample sizes are given in Supplementary 
Table 1.d,i,n, ForceAtlas2 layouts of single-cell K-NN graphs, overlaid with log- 
normalized transcript counts for indicated genes. e,j, 0, Top, colours indicate 
pseudotemporal orderings. Bottom, heat map of selected markers of paraxial 
mesoderm differentiation. Approximate locations of cluster centres are 
indicated. 
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Extended Data Fig. 4| Fate outcomes of PSM-directed differentiationin cell datasets. Columns (individual cells) are grouped by collection time point. 
human and mouse-derived cultures. a,c, ForceAtlas2 layouts of indicated Rows are individual HOX genes ordered by position. Approximate anatomical 
single-cell K-NN graphs, overlaid with classifier prediction scores. b, d, Heat positions of HOX paralogues are indicated onthe right. 
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Data Fig. 5| See next page for caption. 
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Extended Data Fig. 5| Analysis of the human segmentation clock at the 
single-cell level. a, Scheme showing the insertion of a constitutively expressed 
pCAG-H2B-mCherry nuclear label in the safe harbour AAVS1 locus ina HES7- 
Achilles human-iPS-cell background. b, Diffusion (square micrometres per 
minute) for individual human HES7-Achilles cells automatically tracked overa 
period of 24h. The middle hinge corresponds to median, lower and upper 
hinges correspond to first and third quartiles, respectively, and the lower and 
upper whiskers correspond to the minimum and maximum, respectively. 
n=76 cells.c, Distribution of pairwise instantaneous phase shifts between 
individual oscillating human HES7-Achilles cells, binned by instantaneous 
distance between pairs of cells. Pvalues for the pairwise Kolmogorov-Smirnov 
test are as follows: <160 tm versus 160-265 pm: 0.6407, <160 pm versus 265- 
530 um: 0.1811, <160 um versus >530 pm: 0.1340, 160-265 tm versus 265- 

530 pm: 0.1428, 160-265 pm versus >530 pm: 0.6784, and 265-530 pm versus 
>530 um: 0.8171. n=1,000 observations. d, Distribution of phases along the 
unit circle at early, middle and late time points. Each dot represents one cell. 
n=144 cells.e, Illustration of phase determination. Representative raw 
HES7-Achilles fluorescence profile for an automatically tracked cell (left) and 
corresponding processed signal along with the inferred phase from Hilbert 
transform (right). f, Heat map of HES7-Achilles fluorescence intensity over 
time in automatically tracked cells. Each line represents one cell.n=144 cells. 
g, Histogram of the time (hours since onset of imaging) at cell division for 
manually tracked human HES7-Achilles cells. n= 67 cells. h, Left, 
immunofluorescence staining for histone H3 phosphorylated at Serl10, in PSM 
cells derived from human iPS cells, treated with vehicle control (DMSO) or 5 uM 
aphidicolin for 24 or 48 h, starting on day 2 of differentiation. n=5 independent 
experiments. Scale bar, 100 ppm. Right, quantification of phosphorylated 
histone H3 (at Ser10) nuclei as a percentage of total nuclei. The middle hinge 
corresponds to median, the lower and upper hinges correspond tothe first and 
third quartiles, respectively, and the lower and upper whiskers correspond to 
the minimum and maximum, respectively. i, Scatter plot showing the cell-cycle 
timein PSM cells derived from human iPS cells, cultured in CLFBR medium. 
Meants.d.n=26cells.j, Histogram of the HES7-Achilles oscillatory phase at 
the time of cell division in human iPSC-derived PSM cells cultures in CLFBR 
medium. Distribution is not significantly different fromthe uniform 
distribution: Kolmogorov-Smirnov test p=0.225. n=55 cell divisions. 

k, Normalized HES7-Achilles fluorescence intensity profiles for 3 individual 
PSM cells derived from human iPS cells, pre-treated with 5 pM Aahidicolin for 
24h.n=6 independent experiments. 1, Kuramoto order parameter over 20 hon 
day 3 of differentiation for human HES7-Achilles cells treated with vehicle 
control (DMSO) or 5 uM aphidicolin for 24 h. The synchronization threshold is 
shownas the mean +s.d. of the Kuramoto order parameter for same dataset, 
but with randomized phases. n=45 cells (control) or 48 cells (aphidicolin). 

m, Comparison of the Kuramoto order parameter for oscillating HES7-Achilles 
cells treated with vehicle control (DMSO) or 5 uM aphidicolin. The middle hinge 
corresponds to the median, the lower and upper hinges correspond tothe first 


and third quartiles, respectively, and the lower and upper whiskers correspond 
tothe minimum and maximum, respectively. Paired two-sided t-test, P=0.348. 
n=45 cells (control) or 48 cells (aphidicolin).n, (RT-PCR for Notch target genes 
HES7, NRARP and LFNGin PSM cells derived from human iPS cells, treated with 
vehicle control (DMSO) or 25 1M DAPT on day 2 of differentiation. Mean+s.d. 
n=3 biological replicates. o, Example of HES7-Achilles fluorescence intensity 
ina small ROI over a period of 45 hin cells treated with DMSO (vehicle control) 
or the y-secretase inhibitor DAPT (25 1M) in CLFBR medium. n=16 independent 
experiments. p, Representative example of HES7-Achilles fluorescence 
intensity profiles for PSM cells derived from mouse ES cells, treated with 
vehicle control (DMSO) or 25 uM DAPT. n=13 independent experiments. 

q, Kuramoto order parameter over 20 hon day 2 of differentiation for human 
HES7-Achilles cells treated with vehicle control (DMSO) or 25 1M DAPT. The 
synchronization threshold is shownas the mean +s.d. of the Kuramoto order 
parameter for same dataset, but with randomized phases. n=131 cells (control) 
or 110 cells (DAPT). r, Representative immunofluorescence staining for YAP, 
F-actin (phalloidin) and DAPI nuclear stain in isolated human PSM-like cells 
treated with DMSO or latrunculin A (350 nM). Scale bar, 50 pm. 

n=4 independent experiments. s, ChIP-qPCR fold enrichment of the LFNG and 
HES7 promoters in chromatin pulled down with an antibody against NOTCHI1, 
relative to isotype lgG controls. Mean +s.d. iPS-cell control, n= 4; all other 
conditions, n=3 biological replicates. t, Mean HES7-Achilles fluorescence 
intensity for isolated human cells cultured with 350 nM latrunculin A alone, or 
in combination with 25 uM DAPT. The middle hinge corresponds to the median, 
the lower and upper hinges correspond to the first and third quartiles, 
respectively, and the lower and upper whiskers correspond to the minimum 
and maximum, respectively. n=18 cells. u, Scatter plot showing the HES7- 
Achilles oscillatory period for isolated human cells cultured with 350 nM 
latrunculin A alone, or in combination with 25 1M DAPT. Mean+s.d.n=47 
(latrunculin A) or 22 (latrunculin A + DAPT) cells. v, Kuramoto order parameter 
over 18 hon day 2 of differentiation for human HES7-Achilles cells treated with 
DMSO, latrunculin A alone or latrunculin A in combination with DAPT. The 
synchronization threshold is shownas the mean +s.d. of the Kuramoto order 
parameter for the same dataset, but with randomized phases. n= 53 cells 
(control), 18 cells (latrunculin A) or 18 cells (latrunculin A+ DAPT). 

w, Comparison of the Kuramoto order parameter in confluent HES7-Achilles 
cells versus isolated cells treated with 350 nM latrunculin A alone, or in 
combination with 25 uM DAPT. The middle hinge corresponds tothe median, 
the lower and upper hinges correspond to the first and third quartiles, 
respectively, and the lower and upper whiskers correspond to the minimum 
and maximum, respectively. Paired one-way ANOVA with Bonferroni 
correction: confluent control versus LatA, P=1.16 x 10~°; confluent control 
versus latrunculin A+ DAPT, P= 6.8 x 10“; latrunculin A versus 

latrunculin A + DAPT, P=0.304.n=53 cells (control), 18 cells (latrunculin A) or 
18 cells (latrunculinA+DAPT). 


Fot8 c d 
PD172500M 4999, MFGFI7 = gg SF GFB = 100, oF 5 6) © Control » PD03250nM 
Ly Zo G 
%2 soy 404 20 80 ie bo, 
os eS ea 
25 600 t 30 ss 28 60 ge 
us ws o® 
o5 400) = 20 25 40 252 
=u 10 S38 20 } = 36 
32 2 - ~ ~- $3 ~ és 
= 0. 0 = 0. x Lo 
Sr srat” — gSgaieas” Seige og SSN TON oO 
PoP oF oF PoP? oF VOPOP OPS? G Time (hours) 
° ERK ___DAPI_8-Catenin f 3 — Control — 250 nM PDO3 93 — Control — 2uM XAV 
8 — 250nM PD17 — 12uM IWR-1 
DMSO XAV 2uM DMSO XAV 2uM = £ 
2 300 2700 h 
= 2 DB 
17) 5 5 
a 2 2 600 e 
5 
2 3 200 8 500 2 
= 8 8 ZU 
3 3 400 3 
~4 had o 
3 100 S 300 @ 
i <0 5 10 15 20 2530354045 £ oo 10 20 430 40 + «50 
400 Control 4004 100nM PDO3 Time (hours) Time (hours) 
300 300 . 
= 365 p50 J — pDMso — 250nmpoo3 Kk | 
3 q 100 100nMPDO3 — SO0nMPDO3 x m 
> 100 100 > g6 100 1200 
2 ¥ 5 900 
5 0 0 5 2 = 80 600 
2 = g4 § 300 
z = 3 g 
, 4001 26onm PDOS 4004 — s00nM PDO3 8 a3 = 60 me 
© 300 300 8 §? > 40 S 900 
S 8 B14 < o 
<= 200 200 5 5 on 
é 8° 0 onnz zo 
ao 
100 100 > Pa ZZ: ZEEE Soo 
ime (hours) eococe ooo 2 
9 0 228 oseag & 
0 5 10 15 20 25 0 5 10 15 20 25 PDOs PDO Dose © 
Time (hours) 3 
§ 1200 
n Control XAV PD17 PDO3 PDO3 PDO3 °o p ra $00 
610 100 nM 250 nM 500 nM a S 4004 © Control « PDO3 250nM 300 
c | 
.y 
3 0.08 8 #08 52 1200 
E ei] oy #00 
F 0.06 £ of 600 
£ a zs 300. 
& 0.04 g cc 
2 8 el 
x & a e 
© 0.02 x Pa 
Reece Hy q 
oa0 o66565 = SQA DON ONO — 
nea 3 0 Ten 0 Ton 0 Tm 0 Ton 0 TT 0 Tt Sens OE ee ee Q 40 
Time (hours) 3 
Phase Difference (Radians) PDO3 20 
ao 30 
ig 
Yr c95luvelu S t u 2 & 20 
Tailbud = Pe one i S 
s4—pMso  — Pp03065M 88] an 250 2240 
> |— PD03 04uM — PDO3 10uM < . a ot 
53. 2 + 2 = 
2 bam 2 200 si 
Eo 3 E 
A ft = 
8 S2 2 150 
34 8 @ 
2 2 
So 60 100 
zo 10 20 30 zee 922 
Time (hours) = 8 e 2 bi 8 2 
PDO3 Dose PDO03 Dose 


Fibronectin 
Micropatterns 


Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | The role of FGF and WNT signaling in the regulation of 
segmentation clock properties. a, Immunofluorescence staining for d(pERK 
on day 2 of differentiation, following 3 h of treatment with DMSO (vehicle 
control) or the FGFR inhibitor PD173074 (250 nM) in CL medium. 

n=4 independent experiments. b, Left, qRT-PCR for the FGF ligands FGF17 and 
FGF8 on days 1-4 of human iPS cell differentiation. Relative expression is shown 
as the fold change relative to iPS cells at day 0. Mean+s.d.n=3 biological 
replicates. Right, RT-PCR for the FGF ligand Fgf8 on days 2-6 of mouse ES cell 
differentiation. Relative expression is shownas the fold change relative to ES 
cells at day 0. Mean+s.d.n=3 biological replicates. c, Time-course (RT-PCR 
for the FGF target gene SPRY4 in PSM cells derived from human iPS cells, during 
the 10 himmediately after treatment with vehicle control (DMSO) or 250 nM 
PDO3. Relative expression is shownas the fold change relative to ES cells at 

day 0. Mean+s.d.n=3 biological replicates. d, Immunofluorescence staining 
for dpERK in PSM cells derived from human iPS cells (top) or from mouse ES 
cells (bottom), treated with DMSO or PDO3 (250 nM).n=4 independent 
experiments. e, Immunofluorescence staining for dpERK (left), B-catenin and 
nuclear stain (right) in PSM cells derived from human iPS cells, treated with 
vehicle control (DMSO) or 2 uM XAV. n= 4 independent experiments. Scale bar, 
100 um. f, Representative examples of HES7-Achilles fluorescence intensity 
over the course of 45 hinasmall area of interest within human cultures treated 
with DMSO (vehicle control), the MAPK inhibitor PD0325901 (250 nM), or the 
FGFR inhibitor PD173074 (250 nM) in CLFBR medium. n=16 independent 
experiments. g, HES7-Achilles fluorescence intensity over the course of 45 hin 
asmall ROI within human cultures treated with DMSO (vehicle control), 21M 
XAV or 12 EMIWR-1in CLFBR medium. n =3 independent experiments. h, HES7- 
Achilles oscillatory period of individual cells treated with vehicle control 
(DMSO), 2 UM XAV, 250 nM PD17, or 100 nM, 250 nM or 500 nM PDO3 on day 2 of 
differentiation. Mean +s.d. One-way ANOVA Pvalues (NS, not significant): 
0.9929, 0.4097, 0.9998, 0.9845 and 0.7425, from left to right onthe graph. 
n=27 (XAV),n=48 (100 nM PDO3), n=57 (all others) cells. i, Average 
fluorescence intensity profiles for automatically tracked individual HES7- 
Achilles human cells treated with vehicle control (DMSO) or increasing doses of 
PDO3 (100 nM, 250 nM and 500 nM) on day 2 of differentiation. Mean + 95% 
confidence interval. n= 68 cells (control), 45 cells (100 nM), 35 cells (250 nM) or 
36 cells (SOO nM).j, Representative examples of HES7-Achilles fluorescence 
intensity profiles in a small ROI within human cultures treated with increasing 
doses of PD03 (100 nM, 250 nM and 500 nM) or vehicle control (DMSO). 

n=8 independent experiments. k, Number of HES7-Achilles oscillations before 
arrest in small ROIs within cultures treated with increasing doses of PDO3 

(100 nM, 250 nM and 500 nM). One-way ANOVA: 100 nM versus 250 nM, 
P=0.0042;100 nM versus 500 nM, P=2.0 x 10>. n=6 independent 
experiments. I, Average HES7-Achilles fluorescence intensity in small 

ROIs over the course of the oscillatory regime (before the arrest of oscillations) 
in cells treated with vehicle control (DMSO) or increasing doses of PDO3 

(100 nM, 250 nM and 500 nM). The middle hinge corresponds to the median, 


the lower and upper hinges correspond tothe first and third quartiles, and the 
lower and upper whiskers correspond to the minimum and maximum. One-way 
ANOVA: control versus 100 nM, P= 6.7 x10; control versus 250 nM, 
P=6.5x10™; control versus 500 nM, P=1.9 x 10”; 100 nM versus 250 nM, 
P=1.1x107;100 nM versus 500 nM, P=2.5 x 107°. n=6 independent 
experiments. m, Representative HES7-Achilles fluorescence intensity profiles 
for PSM cells derived from mouse ES cells, treated with vehicle control (DMSO), 
2 uM XAV, or 100 nM, 250 nM or 500 nM PDO3. n=12 (control, XAVand100nM 
PDO3) or n=10 (250 nMand 500 nM PDO3) independent experiments. 

n, Histograms showing the instantaneous phase difference relative to control 
for individual cells treated with vehicle control (DMSO), 2 1M XAV, 250 nM 
PD17, or 100 nM, 250 nM or 500 nM PDO3. Details are given in ‘Phase shifts’ in 
Methods. n was fixed at 11,000 observations. 0, Quantification of the average 
phase difference (in degrees) for HES7-Achilles oscillations in small ROIs in 
cells treated with 250 nM PD17, or 100 nM, 250 nM or 500 nM PDO3 relative to 
control (DMSO) cells. The middle hinge corresponds to the median, the lower 
and upper hinges correspond to the first and third quartiles, respectively, and 
the lower and upper whiskers correspond to the minimum and maximum, 
respectively. n=13 (PD17),n=17 (100 nM), n=7 (250 nM) orn=11 

(500 nM) independent experiments. p, q, Time-lapse qRT-PCR for the cyclic 
genes HES7 (p) and LFNG (q) in PSM cells derived from human iPS cells, under 
control (DMSO) and 250-nM PDO3 conditions. Samples were taken every 

30 minimmediately after treatment. Relative expression is shown as the fold 
change relative to ES cells at day 0. Mean+s.d.n=3 technical replicates. 

r, Outline of the experimental strategy used to assess the effect of FGF 
inhibition in primary mouse PSM cells carrying the LuVeLu reporter. The tail 
bud is dissected from E9.5 transgenic embryos, and cells are dissociated for 
seeding on fibronectin micropatterns. Oscillations of the LuVeLu reporter are 
examined in each micropattern.s, LuVeLu fluorescence intensity profilesin 
mouse tail-bud explant cells cultured on CYTOO micropatterns in CLFBR 
medium containing DMSO (vehicle control) or increasing doses of PDO3 

(0.4 uM, 0.65 uM and 10 uM). n=2 independent experiments. t, Number of 
LuVeLu oscillations before arrest in mouse tail-bud explant cells cultured on 
CYTOO micropatterns treated with DMSO (vehicle control) or increasing doses 
of PDO3 (0.4 uM, 0.65 uM and 10 pM). Mean + s.d. One way ANOVA: 0.4 uM 
versus 0.65 uM, P= 0.0642; 0.4 uM versus 10 uM, P=8.4 x10; 0.65 1M versus 
10 pM, P=2.9 x10 °.n=10 micropatterns (0.4 uM), n=7 micropatterns (0.65 LM 
and 10 uM) u, Average period of LuVeLu oscillations in mouse tail-bud explant 
cells cultured on CYTOO micropatterns treated with DMSO (vehicle control) or 
increasing doses of PDO3 (0.4 uM, 0.65 uM and 10 pM). Mean +s.d. One way 
ANOVA: control versus 0.4 UM, P= 0.2785; control versus 0.65 uM, P= 0.0658; 
control versus 10 1M, P=2.7 x 10°; 0.4 1M versus 0.65 UM, P=0.831; 0.4 1M 
versus 10 uM, P=3.05 x107*; 0.65 UM versus 10 pM, P=4 10°. 

n=18 micropatterns (DMSO), n=16 micropatterns (0.4 LM), 
n=12micropatterns (0.65 1M) and n=6 micropatterns (10 LM). 
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Extended Data Fig. 7| Control of PSM maturation by FGF and WNT signalling 
in vitro. a, Snapshots of HES7-Achilles;MESP2-mCherry double-reporter cells on 
days 2-5 of differentiation in CLFBR medium at O, 20, 45 and 60 h. Cultures 
treated with DMSO (control), 25 uM DAPT, 2 uM XAV and 250 nM PDO3 are 
shown. n=10 independent experiments. Scale bar, 100 pm. b, Time of onset of 
MESP2-mCherry expression in PSM cells derived from human iPS cells treated 
with vehicle control (DMSO), 2 1M XAV, 250 nM PD17 or 100 nM, 250 nM or 

500 nM PDO3. Onset of expression is defined by a threshold of 25 AU. 

Mean +s.d. One-way ANOVA: control versus XAV, P=4.6 x 107; control versus 
100nMPDO3, P=5.1; control versus 250 nM PDO3, P=1.3 x10; control 
versus 500 nM PDO3, P=1.4 x 10°; 100 nM versus 250 nM PDO3, P=2.6 x10; 
100 nM versus 500 nM PDO3, P=7.7 x 10 **; 250 nM versus 500 nM PDO3, 
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P=6.9x10°%.n=10independent experiments. c, HES7-Achilles and MESP2- 
mCherry fluorescence intensity profiles in small ROIs within PSM cultures 
derived from human iPS cells, treated with 25 pM DAPT on days 2-5 of 
differentiation in CLFBR medium. Mean +s.d. Dotted line denotes the 
threshold for MESP2 activation (25 AU).n=15 independent experiments. 

d, qRT-PCR for the genes HES7, LFNG, MSGN1, TBX6, DUSP6, FOXC2, MESP2 and 
RIPPLY2in PSM cultures derived from human iPS cells, treated for 24h with 
vehicle control (DMSO) or 250 nM PDO3 in CLFBR medium. Relative expression 
is shownas the fold change relative to iPS cells at day 0. Mean+s.d. 

n=3 biological replicates. e, Immunofluorescence staining for TBX6 on day 3 of 
differentiation (CLFBR medium) in cells treated with DMSO or PDO3 (250 nM). 
n=4 independent experiments. Scale bar, 100 um. 
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Extended Data Table 1| Single-guide RNAs used in CRISPR-Cas9 targeting 


Target Direction | sgRNA PAM PAM site mutation 
Gene site in targeting vector 
hHES7 Antisense | ACCTGCTCGCCCGGACGCCC | GGG GGT 
mHes7 Antisense | TAAGGAGGCACCCAAGCTAC | AGG AAG 
hMESP2 Antisense | GTCTCCAAAACGCGGGCGGT | GGG GGT 


Extended Data Table 2 | Primary antibodies for immunofluorescence 


(Ser10) 


Antibody Species Type Source Catalog Dilution 
Number 

OCT3/4 Mouse Monoclonal | Santa Cruz Sc-5279 1:800 

SOX2 Rabbit Polyclonal | Millipore AB5603 

T/BRACHYURY | Goat Polyclonal R&D AF2085 

TBX6 Rabbit Polyclonal | Abcam ab38883 

CDH1 Mouse Monoclonal | Abcam ab76055 

CDH2 Rabbit Polyclonal | Abcam ab12221 

dpERK Cell 4370P 

Signaling 

B-CATENIN Mouse Monoclonal | BD 610153 1:400 

YAP Mouse Monoclonal | Santa Cruz sc-101199 1:200 

pHistone H3 Rabbit Polyclonal | Santa Cruz sc-8656 1:350 
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Extended Data Table 3 | Primer sequences for qPCR 


Gene Forward Reverse Reference 


hTBX6 AAGTACCAACCCCGCATACA TAGGCTGTCACGGAGATGAA Loh et al. 


hMESP2 AGCTTGGGTGCCTCCTTATT TGCTTCCCTGAAAGACATCA Loh et al. 


hRIPPLY2 | AAGAAGAGGAGACGCCGAAC AGTCTGACTGGGTGCCTGAA This study 


hFOXC2 CCTCCTGGTATCTCAACCACAI1 GAGGGTCGAGTTCTCAATCCC Loh et al. 


hDUSP6 CCAAATCATGGGCTCACTTT CCATGCTCACACACACACAC This study 


hSPRY2 CTGTTTGCGGTGAAATGCT TTGCCTAGGAGTGTCTGTGTTG This study 
hNRARP CCTGCGTCACTTTCTGTCCT AAGGGTCAGCAGCACTTCC This study 


hHES7 AGATTGTAAGAGGTTGAGGCGGAC | GGAAGGATGACTTGGCGCTC This study 
promoter 


hLFNG AGGCTCTGGCTGATCGGAAG AGGTAATTAGCAGTCACCACCTCC | This study 


promoter 


mRspo3 ATGCACTTGCGACTGATTTCT CAGCCTTGACTGACATTAGGATG _| This study 


mF gf8 CATGGCAGAAGACGGAGAC CATGCAGATGTAGAGACCTGTC Du et al. 


Loh et al.*°, Zhou et al.*° and Du et al.” are cited in the table. 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


x| The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


x A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


[x]|[__| A description of all covariates tested 


x A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


[x] A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
4 AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


z O For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


x For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 
x For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 
x] 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection Zen black (Zeiss), MIT Crispr Design Tool (www.crispr.mit.edu; no longer available), In-Fusion cloning tools (https://www.takarabio.com/ 
learning-centers/cloning/in-fusion-cloning-tools), NEBuilder Assembly Tool (https://nebuilderv1.neb.com/), Geneious 9.1.5, ApE 
v2.0.49.10. 

Data analysis Graphpad Prism 7, MATLAB R2018b, CFX Manager 3.1, ImageJ (Fiji), https://github.com/indrops/indrops, https://github.com/ 


AllonKleinLab/SPRING, ScanPy41 (1.4.3), Python 3.6.8, anndata(0.6.22.post1), bbknn(1.3.6), fa2(0.3.5), ipython(7.8.0), jupyterlab(1.1.4), 
leidenalg(0.7.0), louvain(0.6.1), matplotlib(3.0.3), multicoretsne(0.1), numba(0.45.1), numpy(1.17.2), pandas(0.25.1), pytables(3.5.2), 


python-igraph(0.7.1.post7), scikit-learn(0.21.3), scipy(1.3.1), scrublet(0.2.1), seaborn(0.9.0), statsmodels(0.10.1), umap-learn(0.3.10), 
ForceAtlas2 algorithm in Gephi (0.9.1). 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 
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All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 
- Accession codes, unique identifiers, or web links for publicly available datasets 
- Alist of figures that have associated raw data 
- Adescription of any restrictions on data availability 
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Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


[x | Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size Sample sizes were not pre-determined. Rather, we ensured our sample sizes were sufficient by checking that inclusion of additional data 
points did not significantly change the variance (SD) of the data. 


Data exclusions During single cell RNA sequencing quality control steps, low complexity cell barcodes were excluded to avoid droplets that lack a cell but 
contain backround RNA. Data was filtered to only include transcript counts originating from abundantly sampled cell barcodes. This 
determination was performed by inspecting a weighted histogram of Unique Molecular Identifier (UMI) — gene pair counts for each cell 
barcode, and manually thresholding to include the largest mode of the distribution (in all cases >80% of total sequencing reads). Additionally, 
low complexity transcriptomes were filtered out by excluding cell barcodes associated with <250 expressed genes. 

For analysis of oscillator synchronization, we excluded non-oscillating tracks to avoid potentially skewing the Kuramoto order parameter. This 
was done by filtering out cells with Fourier transform modulus below a specified threshold (i.e. 500). 


Replication To ensure the reproducibility of our findings, we carried out all experiments several independent times (exact n for each experiment reported 
in the figure legends). Each independent experiment contained technical triplicates. We ensured that these independent datasets of similar 


size did not change the reported results. 


Randomization | Randomization is not relevant as the same cell lines were used in all cases. 


Blinding Blinding is not applicable to data collection (see above). In the case of time-lapse imaging analysis, all labels were removed and individual 
microscopy files were analyzed blindly in ImageJ/MATLAB for all conditions tested. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
L_ |} [x] Antibodies |_}|[¥} ChiP-seq 
L | [x] Eukaryotic cell lines |__|] _¥} Flow cytometry 
x Palaeontology x MRI-based neuroimaging 


[x Animals and other organisms 


x Human research participants 


x Clinical data 


Antibodies 


Antibodies used OCT3/4 (Santa Cruz sc-5279 Lot E2215 1:800), SOX2 (Millipore AB5603 Lot 3207627 1:300), T (R&D AF2085 Lot KQP0619021 
1:300), TBX6 (Abcam ab38883 Lot GR3226767-3 1:300), CDH1 (Abcam ab76055 Lot GR260008-4 1:300), CDH2 (Abcam ab12221 
Lot GR139340-27 1:300), dpERK (Cell Signaling 4370P Lot 17 1:400), beta-CATENIN (BD 610153 Lot 2146908 1:400), YAP (Santa 
Cruz sc-101199 Lot 10915 1:200), pHistoneH3 (Santa Cruz sc-8656 Lot D1615 1:350), Notch1 (Cell Signaling 3608S Lot 8 3.3 ug 
per IP), Acetylated (Lys9) Histone H3 (Cell Signaling 9649S Lot 13 0.5 ug per IP), Normal Rabbit IgG (Cell Signaling 2729S Lot 8 3.3 
Hg per IP) 


Validation All antibodies were validated by the suppliers and accurately represent expected expression patterns when tested on mouse 
embryos. 
- OCT3/4: Oct-3/4 Antibody (C-10) is recommended for detection of Oct-3/4 of mouse, rat and human origin by WB, IP, IF, IHC(P), 
FCM and ELISA; non cross-reactive with Oct-3/4 isoform B (https://www.scbt.com/p/oct-3-4-antibody-c-10) 
- SOX2: Anti-SOX2 Antibody, Cat. No. AB5603, is a highly specific rabbit polyclonal antibody SOX2 and has been tested for use in 
Immunocytochemistry, and Immunohistochemistry (Paraffin), and Western Blotting. (http://www.emdmillipore.com/US/en/ 
product/Anti-Sox2-Antibody, MM_NF-AB5603) 
-T: Detects human Brachyury in direct ELISAs and Western blots. In direct ELISAs, less than 10% cross-reactivity with recombinant 
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human (rh) TBX-6, rhTBX-2, rhTBX-5, and rhTBX-18 is observed. Reactivity to mouse and human. Applications: Western blot, 
ChIP, Immunocytochemistry, Immnuohistochemistry. (https://www.rndsystems.com/products/human-mouse-brachyury- 
antibody_af2085) 

- TBX6: Tested applications, Suitable for: IHC-Fr, ICC/IF, WB (https://www.abcam.com/tbx6-antibody-ab38883.html) 

- CDH1: ab76055 does not cross react with VE Cadherin or N Cadherin. This product may give a weak signal in Western Blot when 
using unstimulated cell lines. Tested applications, Suitable for: Flow Cyt, ICC/IF, IHC-P, IHC-Fr, WB, IP, ELISA, ICC. Species 
reactivity, Reacts with: Mouse, Rat, Horse, Human. (https://www.abcam.com/e-cadherin-antibody-m168-c-terminal- 
ab76055.html) 

- CDH2: Tested applications, Suitable for: Flow Cyt, IHC-Fr, WB, IHC-P, ICC/IF, ELISA. Species reactivity, Reacts with: Mouse, Rat, 
Human 

- dpERK: Phospho-p44/42 MAPK (Erk1/2) (Thr202/Tyr204) (D13.14.4E) XP® Rabbit mAb detects endogenous levels of p44 and 
p42 MAP Kinase (Erk1 and Erk2) when dually phosphorylated at Thr202 and Tyr204 of Erk1 (Thr185 and Tyr187 of Erk2), and 
singly phosphorylated at Thr202. This antibody does not cross-react with the corresponding phosphorylated residues of either 
JNK/SAPK or p38 MAP kinases. Species Reactivity: Human, Mouse, Rat, Hamster, Monkey, Mink, D. melanogaster, Zebrafish, 
Bovine, Dog, Pig, S. cerevisiae. Applications Western Blotting, Immunoprecipitation, Immunohistochemistry (Paraffin) , 
Immunofluorescence (Immunocytochemistry), Flow Cytometry. (https://www.cellsignal.com/products/primary-antibodies/ 
phospho-p44-42-mapk-erk1-2-thr202-tyr204-d13-14-4e-xp-rabbit-mab/4370) 

- beta-Catenin: Reactivity, Human (QC Testing) Mouse, Rat, Dog, Chicken (Tested in Development). Applications, Western blot 
(Routinely Tested), Immunohistochemistry, Immunoprecipitation, Immunofluorescence (Tested During Development). (https:// 
www.bdbiosciences.com/us/applications/research/stem-cell-research/cancer-research/human/purified-mouse-anti-- 
catenin-14beta-catenin/p/610153) 

- YAP: raised against recombinant YAP of human origin, recommended for detection of YAP of mouse, rat and human origin by 
WB, IP, IF, IHC(P) and ELISA. (https://www.scbt.com/p/yap-antibody-63-7). 

-pHistoneH3: recommended for detection of Ser 10 phosphorylated Histone H3 of mouse, rat, human, Drosophila melanogaster, 
Xenopus laevis and avian origin by WB, IP, IF, IHC(P) and ELISA; also reactive with additional species, including and equine, 
canine, bovine, porcine and avian. (https://www.scbt.com/p/p-histone-h3-antibody-ser-10) 

- Notch1: Notch1 (D1E11) XP® Rabbit mAb detects intracellular epitopes between 2400 and 2500 amino acids of human Notch1. 
It recognizes both the full-length (~300 KDa) and the NTM region (~120 KDa), which consists of a short extracellular 
juxtamembrane peptide, a transmembrane sequence and the intracellular domain (NICD). The antibody cannot detect the 
extracellular (ligand-binding) domain of Notch1 following cleavage at the S2 site by ADAM-type metalloproteases. Species 
Reactivity: Human, Mouse, Rat. Applications: Western Blotting, Immunoprecipitation, Immunohistochemistry (Paraffin), 
Chromatin IP. (https://www.cellsignal.com/products/primary-antibodies/notch1-d1e11-xp-rabbit-mab/3608) 

- Acetylated (Lys9) Histone H3: Acetyl-Histone H3 (Lys9) (C5B11) Rabbit mAb detects endogenous levels of histone H3 only when 
acetylated on Lys9. This antibody does not cross-react with other acetylated histones. Species Reactivity: Human, Mouse, Rat, 
Monkey, Zebrafish. Applications: Western Blotting, Immunoprecipitation, Immunohistochemistry (Paraffin), 
Immunofluorescence (Immunocytochemistry), Flow Cytometry, Chromatin IP, Chromatin IP-seq. (https://www.cellsignal.com/ 
products/primary-antibodies/acetyl-histone-h3-lys9-c5b11-rabbit-mab/9649) 
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Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) human iPS NCRM1 line was obtained from RUCDR Infinite Biologics at Rutgers University 
https://commonfund.nih.gov/stemcells/lines#RMP-generated%20iPSC%20lines 
Mouse E14 mESCs (129P2 genetic background) were obtained from BayGenomics. 


Authentication Authentication was unnecessary due to the unique morphology of human iPS and mouse ESC colonies, as well as their unique 
differentiation potential. We nevertheless stained for pluripotency markers (Oct4, Nanog, Sox2). 


Mycoplasma contamination All cell lines tested negative for mycoplasma contamination 


Commonly misidentified lines No commonly misidentified cell lines were used. 
(See ICLAC register) 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals Mus musculus LuVeLu reporter line (Aulehla et al. 2008) E9.5 pups both male and female 

Wild animals The study did not involve wild animals S 
ia 
S 

Field-collected samples The study did not involve samples collected from the field. ® 
No 
S 

Ethics oversight The study protocol was approved by Brigham and Women's Hospital IACUC/CCM. oe. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Data access links We performed ChIP-qPCR (not ChIP-seq), so all sections referring to high throughput sequencing data are not applicable. 
May remain private before publication. N/A 
Files in database submission N/A 
Genome browser session N/A 
e.g. UCSC) 
Methodology 
Replicates For ChIP-qPCR, we used n=3 or n=4 independent experiments as replicates (see figure legend for exact sample sizes). 
Sequencing depth /A 
Antibodies otch1 (Cell Signaling 3608S Lot 8 3.3 ug per IP), Acetylated (Lys9) Histone H3 (Cell Signaling 9649S Lot 13 0.5 ug per IP), 
ormal Rabbit IgG (Cell Signaling 2729S Lot 8 3.3 yg per IP) 
Peak calling parameters /A 
Data quality /A 
Software For qPCR data collection and analysis, CFX manager 3.1. 


Flow Cytometry 


Plots 


Confirm that: 


x | The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 


|| The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers). 


x | All plots are contour plots with outliers or pseudocolor plots. 


x | A numerical value for number of cells or percentage (with statistics) is provided. 


Methodology 
Sample preparation mESC-derived or human iPSC-derived PSM cells were differentiated in CL medium as indicated in Methods section. On the day of 
sorting, they were dissociated with TrypLE (mESC) or Accutase (hiPSC). The cells were resuspended in sorting buffer composed of 
PBS with 1% Pennicilin/Streptomycin and 2% fetal bovine serum. 
Instrument BioRad S3 cell sorter with 488 and 561 lasers 
Software BioRad ProSort version 1.5 


Cell population abundance _ Sorting was not performed, we only used FACS for analysis. 


Gating strategy We first selected for singlets by using an FSC height vs . FCS area gate. We then selected viable cells and excluded cell debris by 
applying an FSC vs. SSC gate. For cell lines carrying Venus reporters (mESC pMsgni-Venus and hiPSC MSGN1-Venus), we used 
parental cell lines that do not carry the reporters as negative controls to determine the boundary between negative and positive 
cell populations. Parental lines were differentiated to a PSM state in parallel to experimental samples. 


x | Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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Individual cellular activities fluctuate but are constantly coordinated at the 
population level via cell-cell coupling. A notable example is the somite segmentation 
clock, in which the expression of clock genes (such as Hes7) oscillates in synchrony 
between the cells that comprise the presomitic mesoderm (PSM)’”. This 
synchronization depends on the Notch signalling pathway; inhibiting this pathway 
desynchronizes oscillations, leading to somite fusion® ’. However, how Notch 
signalling regulates the synchronicity of HES7 oscillations is unknown. Here we 
establish a live-imaging system using a new fluorescent reporter (Achilles), which we 
fuse with HES7 to monitor synchronous oscillations in HES7 expression in the mouse 
PSM ata single-cell resolution. Wild-type cells can rapidly correct for phase 
fluctuations in HES7 oscillations, whereas the absence of the Notch modulator gene 
lunatic fringe (Lfng) leads to a loss of synchrony between PSM cells. Furthermore, 
HES7 oscillations are severely dampened in individual cells of Lfng-null PSM. However, 
when Lfng-null PSM cells were completely dissociated, the amplitude and periodicity 
of HES7 oscillations were almost normal, which suggests that LFNG is involved mostly 
incell—cell coupling. Mixed cultures of control and Lfng-null PSM cells, and an 
optogenetic Notch signalling reporter assay, revealed that LFNG delays the signal- 
sending process of intercellular Notch signalling transmission. These results— 
together with mathematical modelling—raised the possibility that Lfng-null PSM cells 
shorten the coupling delay, thereby approaching a condition known as the oscillation 
or amplitude death of coupled oscillators®. Indeed, asmall compound that lengthens 
the coupling delay partially rescues the amplitude and synchrony of HES7 oscillations 
inLfng-null PSM cells. Our study reveals a delay control mechanism of the oscillatory 


networks involved in somite segmentation, and indicates that intercellular coupling 
with the correct delay is essential for synchronized oscillation. 


The segmentation clock controls the periodic formation of somites, 
which are repetitive structures that lie along the body axis and give 
rise to vertebrae and ribs. The core of this clock system is controlled 
by cyclic expression of Hes or Her genes (such as Hes7*””), and by the 
periodic activation of Notch, FGF and WNT signalling pathways inthe 
PSM!”. In mice, the expression of Hes7 oscillates with an approximately 
2-h periodicity, which defines the pace of segmentation’. Individual 
PSM cells carry their own clock, but are coupled to each other to gener- 
ate coherent oscillation waves that lead to the formation of segmenta- 
tion boundaries. This coupling is essential for segmentation, because 
uncoupling between cells results in severe somite fusion and mor- 
phological irregularities ’. The Notch pathway is a critical mediator 
of this coupling mechanism ina range of species’ ’. Hes7 oscillations 
drive oscillatory expression of the Notch ligand gene Delta-likel (DID), 
which affects Hes7 oscillations in neighbouring cells"”. However, DI 


alone is not sufficient for synchronous oscillations. In mice, LENG—a 
glycosyltransferase for DLL1 and Notch proteins®—also exhibits oscil- 
latory expression under the control of Hes7 and has previously been 
suggested to be a key coupling factor: Lfng-knockout mice exhibit 
somite segmentation irregularities, as Hes7 expression becomes asyn- 
chronous between PSM cells”. However, most previous analyses 
have been based on fixed samples and—as such—direct observations 
of single-cell clock oscillator dynamics are lacking. 

Clock-gene reporters are powerful tools for studying oscillator 
dynamics but need improvement. Previous imaging analyses using a 
Hes7-promoter-driven destabilized luciferase reporter (pHes7-UbLuc) 
enabled ensemble detection of Hes7 oscillations with a shorter period, 
and a substantially lower amplitude, in Lfng-knockout PSM than in 
the wild type” (Extended Data Fig. 1). The overall attenuation seen 
in the Lfng-knockout waveform could possibly result from the lower 
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amplitude of oscillations of individual PSM cells, desynchronization 
between PSM cells or both. To discriminate between these possibilities, 
it is imperative to quantitatively follow the oscillations in individual 
PSM cells. A luciferase-based reporter system is not able to quantify 
Hes7 oscillations in individual cells of the intact PSM because of its 
limited spatiotemporal resolution. Therefore, we established novel 
HES7 fluorescent reporter mice. We first produced a reporter for HES7 
using the fast-maturing yellow fluorescent protein (YFP) ‘Venus’ to 
make a Venus-HES7 fusion protein, but we were not able to obtain suf- 
ficient signal for single-cell quantification (n =O out of 7 established 
mouse lines). Considering the short half-life of HES7 (22.3 min)”, fusion 
to this rapidly degraded protein was thought to prevent Venus from 
synthesizing its chromophore before degradation of the fused protein. 
We therefore performed directed evolution of the Venus gene through 
successive rounds of mutagenesis, screening and validation to improve 
the maturation rate (Methods). In total, 15 residues were subjected to 
site-directed random mutagenesis, and the subsequently constructed 
gene libraries were screened by selecting for bacterial colonies with 
fast maturation. We developed a faster-maturing YFP variant with eight 
amino acid substitutions, which we designate Achilles (Extended Data 
Fig. 2). In vitro experiments revealed that Achilles has the same spectral 
properties and maturation yield as Venus, but that Achilles outperforms 
Venus in terms of maturation speed (Fig. 1b, c, Extended Data Fig. 2). 

We next generated transgenic mice carrying the Hes7-promoter- 
driven Achilles reporter, pHes7-Achilles-Hes7 (hereafter, Hes7-Achilles) 
(Extended Data Fig. 3); this reporter showed higher intensity and oscil- 
lation amplitude in signal detection than did the Venus reporter. Live 
imaging of PSM tissues from mice carrying the Hes7-Achillies reporter 
(Extended Data Fig. 3b)—which showed a pattern that was the most 
similar to endogenous HES7 protein expression among the tested con- 
structs—successfully captured oscillatory expression at single-cell 
resolution (in n= 2 out of 3 established mouse lines) (Fig. 1a, d). Fur- 
thermore, this line rescued the phenotype of Hes7-null mice (Extended 
Data Fig. 4), which suggests that the Achilles-HES7 fusion protein is 
biologically functional. Cell tracking and signal quantification enabled 
us to quantify the phase of HES7 oscillation in individual PSM cells over 
time (Fig. le, Extended Data Fig. 5). Using the Hes7-Achilles reporter, 
we compared HES7 oscillation dynamics between wild-type or Lfng’*” 
control and Lfng-knockout mice by culturing whole PSM tissues” and 
tail-bud regions”. In both control and Lfng-knockout PSM, each cell 
exhibited stable oscillation (Fig. 1d, e, Supplementary Videos 1, 2). 
Notably, inthe control PSM, HES7 expression oscillated synchronously 
between neighbouring cells (Fig. 1d e, 2a). Phase fluctuation sometimes 
occurred—probably owing to cell division and migration—but this was 
immediately corrected inthe control, such that synchrony was restored 
by the next cycle (Fig. 2a). By contrast, individual Lfng-knockout cells 
showed asmaller amplitude, a shorter period and more phase fluctua- 
tionthan control cells inthe PSM (Fig. 1d, e, 2a-c, Supplementary Vid- 
eos 1,2). The averaged HES7 expression levels decreased inthe anterior 
Lfng-knockout PSM compared to the control (Fig. 2d). We assessed 
the degree of synchronization between oscillators by measuring the 
mean phase coherence (using the Kuramoto order parameter)”, which 
showed that Lfng-knockout PSM cells have a lower synchronization rate 
than control cells (Fig. 2e, f). We also performed tail-bud cultures and 
found milder, but similar, defects in Lfng-knockout tissue (Extended 
Data Fig. 6). Similar defects were observed in another, independent line 
of mice carrying the Hes7-Achilles reporter (Extended Data Fig. 6d-g). 
Furthermore, both acute inhibition of Notch signalling (by treatment 
with the Notch inhibitor DAPT (N-[N-(3,5-difluorophenacetyl)-L-alanyl]- 
S-phenylglycine t-butyl ester)) and acute knockdown of Lfng gradually 
led to similar defects in the control tail-bud cultures (Extended Data 
Fig. 7a—f), as previously observed in Notch-signalling mutants’. These 
data indicate that the lower amplitude at the population level in Lfng- 
knockout PSM originates from both lower amplitudes in individual 
cells and reduced synchronization across cells. 


120 | Nature | Vol580 | 2 April 2020 


a, : Hes7 (-ATG) Hes7 B49 
pHes7 (= 3’ UTR 

{Achilles Tt HH x 08 
HEST 3 06 

ij 
£04 

{ (e} 

ae 


Wavelength (nm) 
d i i 120 min 


180 min 


Control 
(Ling *) 


- We. 


Control (Ling **) 
O min 60 min 


Anterior 


‘i 


Posterior 


Ventral 


120 min 180 min___ 210 min 


(AU) 


Anterior & 


1.5) 1 
0 200 400 600 [) 


Time (min) 


Fluorescence 


200 400 600 
Time (min) 
Lfng-KO 


O min 60 min 120 min__180 min___ 210 min 


(AU) 


Phase (rad) 


® 
3) 
c 
G 
is) 
a 
g 
fo] 
5 
i 


j { NY 
A VW 
200 400 600 

Time (min) 


15 a 
0-200 «400 600 0 


Time (min) 


d 
a 


Fig. 1| Characterization of Achilles and analysis of oscillations of the Hes7- 
Achilles reporter in control and Lfng-knockout mice. a, Structure of the 
Achilles-Hes7 transgene. Expression of the Achilles-HES7 fusion protein was 
quantified and calculated for oscillation phase mapping in each PSM cell. UTR, 
untranslated region. b, Excitation (broken) and emission (solid) spectra of 
Achilles (red) and Venus (black). FI, fluorescence intensity. c, Time course of 
fluorescence intensities of Achilles (red) and Venus (black), synthesized from 
their mRNAs using the PURE system” (mean values + s.e.m. from three 
experiments). d, Live imaging of the Hes7-Achilles reporter in wild-type and 
Lfng-knockout (Lfng-KO) PSM by confocal microscopy. Z-projection images of 
the maximum intensity are shown. Signals were obtained at a single-cell 
resolution. The schema indicates the orientation of the PSM. e, Single-cell 
analysis of wild-type and Lfng-knockout PSM. Left, HES7 phase distribution in 
wild-type and Lfng-knockout PSM. Right, Fluorescence and phase time series 
fromten randomly selected cells in the posterior part of wild-type and Lfng- 
knockout PSM. AU, arbitrary unit. Scale bars, 100 pm. 


To address whether the lower amplitude in Lfng-knockout PSM arises 
from the lower amplitude of intrinsic oscillation or a coupling pro- 
cess, we examined expression of the Hes7-Achilles reporter in single 
isolated cells that had no interactions with their neighbouring cells. 
In these single-cell dissociation cultures” (Fig. 2g), HES7 oscillations 
were independent of Notch signalling (Extended Data Fig. 7g, h). Under 
this condition, both control and Lfng-knockout PSM cells maintained 
stable oscillations with similar periodicity and amplitudes that were 
only slightly different in each background (about 10% smaller in the 
Lfng-knockout cells) (Fig. 2h-k). Because the oscillation amplitude 
did not markedly differ between control and Lfng-knockout dissoci- 
ated cells, the substantially smaller amplitudes detected in the intact 
Lfng-null PSM (Fig. 2c) probably result from abnormal cell-cell coupling 
through Notch signalling. 

To understand the role of LFNG in cell-cell coupling mediated by 
Notch signalling, we directly assessed how oscillations are affected in 
mixed cultures of wild-type and Lfng-knockout cells using the Hes7- 
Achilles reporter. When a small ratio (1:20) of wild-type cells were mixed 
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Fig. 2|Loss of Lfng affects oscillation period, amplitude and 
synchronization. a-k, HES7 oscillations were examined in intact PSM tissues 
(a-f) and dissociated PSM cells (g-k). Wild-type (a-f) or Lfng’” (h-k) PSM cells 
were used as controls.a, cos 8 plots of single-cell time series in control and 
Lfng-knockout PSM. Eachrowcorresponds to one cell. Tracks are aligned on 
the basis of the average position along the anterior (A)-posterior (P) axis. The 
HES7 expression domain was divided into 5 positions, and positions 2 and Sin 
the schema were used for quantification of the anterior and posterior PSM, 
respectively (b-f). b, Oscillation period from time series of fluorescence of the 
Hes7-Achilles reporter in single PSM cells. c, Oscillation amplitude from time 
series of fluorescence of the Hes7-Achilles reporter in single PSM cells. 

d, Average expression levels of fluorescence of the Hes7-Achilles reporter in 
single PSM cells. At least 190 cells were examined for each genotype. Error bars 
indicate s.e.m.***P< 0.001, ****P< 0.0001, unpaired t-test. e, Phase distribution 
at the first peak timing of average signals in the posterior and anterior PSM. At 
least 100 cells were examined for each genotype. *P<0.05, ***P< 0.001, 

****P < 0.0001, Rayleigh test. f, Kuramoto order parameter calculated using the 
phase shown ine. Error bars indicate s.e.m.*P< 0.05, unpaired t-test. g, Tail-bud 
tissue was cultured for 24 h before dissociation. After dissociation, cells were 
cultured on fibronectin-coated plates inthe presence of 0.5 uM latrunculinA 
(lat A). Scale bars, 100 pm. h, Examples of signals from the Hes7-Achilles 
reporter from regions of interest in dissociation cultures of PSM cells. i, 
Examples of signal of the Hes7-Achilles reporter in dissociation culture of PSM 
cells.j, Oscillation period of fluorescence of the Hes7-Achilles reporter in 
dissociated PSM cells. k, Oscillation amplitude from fluorescence of the Hes7- 
Achilles reporter in dissociated PSM cells. At least 100 cells were examined for 
each genotype. Error bars indicates.e.m.*P< 0.05, unpaired t-test. 
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Fig.3|Loss of Lfng affects timing information in cell-cell signal 
transmission. a, Wild-type PSM cells expressing Achilles-HES7 or expressing 
both Achilles-HES7 and H2B-mCherry were mixed at a 20:1 ratio. b, Wild-type 
PSM cells (white) were mixed as a minority in Lfng-knockout cells (pink) ina 1:20 
ratio. c, Lfng-knockout PSM cells (pink) were mixed asa minority in wild-type 
cells (white) ina1:20 ratio. Fluorescence was quantified over time inthe 
minority and majority cells. Only representative cells, as well as the population 
average, are shown (middle panels). The distribution of phase difference 
between the minority cells and their neighbouring cells was calculated at each 
time point (right panels). At least 150 minority cells were examined in 
4 independent experiments for each mixture. ****P< 0.0001, Rayleigh test. 


into the Lfng-knockout cell population (‘wild type in Lfng-knockout’), 
the wild-type cells expressed a normal level of HES7 and maintained 
roughly the same pace as Lfng-knockout cells (Fig. 3b, middle). The 
accuracy was decreased in this condition (Fig. 3b, right) compared 
with coupling between wild-type—wild-type cells (Fig. 3a), but this 
is most probably due to the fluctuation of inputs from neighbouring 
Lfng-knockout cells. Thus, DLL1 signals from Lfng-knockout cells were 
transmitted to wild-type cells. However, wild-type cells exhibited an 
advance in peak phase of about 0.25 m (corresponding to about 15 min), 
as compared toLfng-knockout cells (Fig. 3b, right). This phase advance 
in wild-type cells compared to Lfng-knockout cells indicated that DLL1- 
Notch signal transmission from Lfng-knockout cells is faster than that 
from wild-type cells, suggesting that the absence of LFNG shortens the 
sending process in Notch signalling. By contrast, when mixing a small 
ratio (1:20) of Lfng-knockout cells into a wild-type population (‘Lfng- 
knockout in wild type’), HES7 oscillations in Lfng-knockout cells showed 
lower amplitudes and did not keep phase well with wild-type cells, 
which indicates that the Lfng-knockout cells did not respond properly 
to DLL1 signals from wild-type cells (Fig. 3c) and suggests that LENG 
regulates the amplitude of HES7 oscillations in the receiving process 
of Notch signalling. These data indicate that LFNG has dual functions: 
delaying the signal-sending process and increasing the amplitude in 
the signal-receiving process. 

The coupling observed in the wild type in Lfng-knockout condi- 
tion—but not in Lfng-knockout in wild type condition—could be due 
to asymmetric coupling of PSM cells, in which faster oscillators (suchas 
Lfng-knockout cells) can accelerate slower oscillators (such as wild-type 
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cells), whereas slower oscillators cannot decelerate faster oscillators. 
To exclude this possibility, we co-cultured wild-type PSM cells with 
mutant PSM cells that exhibited faster HES7 oscillation by deletion 
of two introns from the Hes7 gene” (In(3); Extended Data Fig. 8). This 
analysis showed that slower wild-type oscillators can decelerate a small 
ratio (1:20) of faster mutant oscillators (Extended Data Fig. 8b), indicat- 
ing that the phase advance in the wild type in Lfng-knockout condition 
is not due to asymmetric coupling. 

We further examined the role of Lfng in cell-cell coupling mediated 
by Notch signalling, using the recently developed optogenetic sender-— 
receiver system”. In this system, expression of the Notch ligand DLL1is 
optogenetically induced in sender cells, and the response in receiver 
cells is monitored using a Hes! reporter” (Fig. 4a). In these cells, endog- 
enous Hes] expression oscillates with an approximately 2-h periodic- 
ity—similar to Hes7 oscillations in the PSM”. Sender and receiver cells 
were co-cultured and, after optogenetic induction of Dill expression, 
expression of the Hes! reporter in receiver cells was monitored using 
photomultiplier tubes. The presence of LFNG in DLL1 signal-sending 
cells increased the time required for the HesI response (compare lanes 1 
and 2 or lanes 4. and Sin Fig. 4b, top; Fig. 4c) and decreased the amplitude 
inreceiver cells (Fig. 4b, bottom). The delayed Hes1 response was almost 
the same, irrespective of whether Lfng expression was sustained or oscil- 
latory (compare lanes 2 and 3 or lanes 5 and 6 in Fig. 4b, top). We also 
found that the transport of DLL1 protein to the cell surface was delayed 
by about 15 minin the presence of Lfng compared to the absence of Lfng 
(Fig. 4d—-h). However, the half-life of DLL1 protein was not affected by 
LFNG (Fig. 4). By contrast, LFNG in receiver cells did not affect the delay 
(compare lanes 1 and 4 in Fig. 4b, top), but increased the amplitude of 
the Hes1 response (compare lanes 1 and 4 in Fig. 4b, bottom). Thus, LENG 
increases both the delay in the signal-sending process and the amplitude 
inthe signal-receiving process, which agrees well with the results of the 
wild-type and Lfng-knockout mixed-cell-culture experiments. 

Mathematical modelling (Extended Data Fig. 9a—c) suggests that the 
coupling delay (r,), the time required for Hes7 from one cell to repress 
Hes7 in its neighbouring cell, is very important for the dynamics of 
in-phase oscillations". The in-phase oscillations are severely damp- 
ened when this delay is decreased or increased, which disrupts cell- 
cell synchrony (compare T, = 1.0 with other T, values in Extended Data 
Fig. 9d) and approaches a condition knownas amplitude or oscillation 
death’ (Extended Data Fig. 9e), in which the expression becomes steady 
(non-oscillatory). We speculate that by increasing the time required 
for intercellular DLL1-Notch signal transmission, LENG may adjust the 
coupling delay to make it suitable for robust in-phase oscillations. It 
has previously been shown that expression level of the Notch intracel- 
lular domain—which is formed upon activation of Notch signalling— 
oscillates in the PSM dependently on Lfng’*”*” and that sustained 
expression of Lfng downregulates endogenous Lfng expression’’, 
which suggests that LFNG is involved in the downregulation of Notch 
signalling. However, the average levels of HES7 expression decreased 
in the anterior Lfng-null PSM (Fig. 2d). Furthermore, it has previously 
been shown that sustained Lfng expression does not abolish the cyclic 
expression of endogenous Hes7 in the PSM”. Thus, the repressor role 
of Lfng in the PSM remains obscure, and our data suggest that LFNG 
does not inhibit Notch signalling but rather increases the amplitude 
and the coupling delay (Fig. 4b). 

To address the importance of the coupling delay in synchronized 
oscillations, we performed chemical library screening with PSM-like 
tissues derived from embryonic stem cells (ES cells)®° to search for 
small molecules that could ameliorate the Lfng-knockout phenotype. 
Because the coupling delay decreased in the absence of Lfng, chemicals 
that increase the coupling delay may, at least partially, rescue the Lfng- 
knockout phenotype. Such chemicals would slightly increase the period 
of HES7 oscillations in wild-type cells (Extended Data Fig. 9e), although 
mechanisms other than the coupling delay could also affect the oscilla- 
tory period. We screened 431 compounds that target mainly signalling 


122 | Nature | Vol580 | 2 April 2020 


a Receiver dLuc b 8 150 ee Pree 
pHes1 cl <¢ 140 xx *xK 
Dl See Faluce Ee 
.. — —. aa 
— o 
NOTCH come ipa. 2 120 
pHes! 110 
: on 0.6 
Eee aT [DIS aK 


pl 


Normalized 
amplitude (AU) 


Co) 


— -Lfng in sender 
—  +Lfng in sender 


ey 


counts (AU) 
go90929; 


Receiver 
% 80 120 160 200 pHes1-Ub-NLS-Luc2 
DII3 
Lfng 


Normalized photon ® | 


++ 


Time after light pulse (min) 


Mem- Golgi-mC 
Merge iRFP670 DLL1-Luc2 H2B-mC 


d 

. <i DLL e 

os 

6 O8hcaveo 
4 eae 
pUAS pPGK 
pPGK Golgi-mC EEjmem-iRFP 
mel Fi 

Par 128] pPGK 


p 


f Time after light pulse 
0 min 42 min 84 min 126 min 168 min 


210 min 252 min 


9 _ h i 
8210 160 ; 2.0 
og —— -Lfng = ° 
Be 08 ree ei © S.. Ip 
ie © = 
ES 06 2120 2 
=n E400 = = 
pe 04 cal S 1.0 
So S = 
gE 02 & 80 
= 2 : 
gt 9-50 100 150 200 250 300 60" Ling + Ling -Ling + Ling 


Time after light pulse (min) 


Fig. 4| LFNGin sending cells lengthened the time required for Hes1 response 
to DLL1. a, C2C12 myoblast sender cells carried the hGAVPO-based 
optogenetic Dil1-inducible system, whereas C2C12 myoblast receiver cells 
carried the Hes1-UbLuc2 reporter”. These cells were co-cultured, and Hes1 
reporter expression was monitored after light-induction of DLL1.b, Top, 
averages of peak timings in Hes1 reporter signals were compared between 
receiver cells with and without Lfng. Bottom, averages of amplitude in Hes1 
reporter signals divided by mean signal intensity were compared between 
sender and receiver cells with and without Lfng. Oscillatory Lfng (light- 
inducible Lfng) expression was also induced in sender cells. n> 20 for each 
combination. c, Representative time series of Hes1 reporter signal in receiver 
cells co-cultured with sender cells that express D/lJ with or without Lfng. 

d, DLL1-Luc2 fusion protein was expressed in C2C12 cells with or without Lfng 
using the hGAVPO-based optogenetic inducible system. Golgi-mCherry (mC)- 
2a-mem-iRFP670 was also expressed asa marker for image segmentation. 

e, DLL1-Luc2-expressing cells were co-cultured with wild-type C2C12 cells at 
1:4 ratios. Luminescence, iRFP670 and mCherry signals were imaged witha 
charge-coupled device (CCD) camera after blue-light illumination. Snapshots 
of cells from multicolour imaging are shown. Scale bar, 50 pm. f, 
Representative time series of DLL1-Luc2 images after light pulse. Scale bar, 

50 um. g, Normalized DLL1-Luc2 signals at plasma membrane (iRFP* mCherry ) 
after light pulse. h, Peak timings of DLL1-Luc2 signals after light pulse. Average 
peak timing from three independent experiments are shown. i, Half-life of 
DLL1-Luc2 in the presence or absence of Lfng. Average half-life from three 
independent experiments is shown. Error bars indicate s.e.m. *P< 0.05, 
**P<0.01,***P< 0.001, unpaired t-test. 


and gene regulation, and found that 26 of them increased the period of 
Hes7 oscillations by more than 10 min in PSM-like tissues derived from 
ES cells (Supplementary Table 1). Two of them (norcantharidin and 
kenpaullone) regulate WNT signalling, which is known to have crosstalk 


with Notch signalling’. Thus, we analysed additional WNT signalling 
regulators and found that KY02111 (N-(6-chloro-2-benzothiazolyl)- 
3,4-dimethoxybenzene-propanamide), kenpaullone, IWR-Il and C59 
increased the coupling delay in the optogenetic sender-receiver system 
(Extended Data Fig. 10a, b). Kenpaullone significantly decreased the 
amplitude, but the others did not (Extended Data Fig. 10c). Among these 
compounds, KY02111 did recover the amplitude and synchrony of HES7 
oscillations of Lfng-knockout PSM cells to some extent (Extended Data 
Fig. 10d-g), which suggests that this compound can partially rescue the 
amplitude and synchrony of HES7 oscillations in Lfng-knockout PSM 
cells by lengthening the coupling delay. 

In summary, we have established a powerful live-cell imaging 
method that enables the quantification of oscillatory dynamics with 
single-cell resolution. Using this method, we have demonstrated how 
a phase delay can affect the collective dynamic oscillatory expres- 
sion of genes. Although the pulsatile expression of the Notch ligand 
DLL1can incompletely entrain oscillations in neighbouring cells”, the 
synchrony critically depends on the coupling delay" (Extended Data 
Fig. 9e). Our findings showed that LFNG is a key coupling factor that 
may make the delay of intercellular DLL1-Notch signal transmission 
suitable for robust synchronous oscillation. Furthermore, because 
Lfng mutations cause spondylocostal dysostosis, our study also raises 
the possibility that small compounds that correct the coupling delay 
have the potential to be used for treatment of this congenital disease. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized and investigators were not blinded 
to allocation during experiments and outcome assessment. 


Generation of Achilles 

Venus'* was used as a starting template for PCR-based site-directed and 
semi-random mutagenesis with degenerate primers. Amplified cDNAs 
were subcloned in-frame into the BamHI and EcoRI sites of pRSET, and 
constructed vectors were transformed into Escherichia coliJM109(DE3). 
Colonies were screened for fluorescence using a trans-illuminator. 
Fifteen positions (Ser30, Tyr39, GIn69, Cys70, Ile128, Asp129, Tyr145, 
Asn146, Ser147, His148, Lys166, lle167, Arg168, His169 and Ala206) were 
investigated anda variant with Ser30Arg, Tyr39lle, GIn69Ala, Cys70Val, 
Ile128Ser, Asp129Gly, Tyr145Phe and Ala206Phe was chosenas Achilles. 
The nucleotide sequence reported in this paper has been deposited 
in the DDBJ/EMBL/GenBank under the accession number, LC381432 
(Achilles). 


In vitro characterization of fluorescent proteins 

JM109(DE3) cells expressing His-tagged fluorescent proteins were 
grown at 37 °C ona rotary shaker at 180 rpm for 17 hin LB medium. 
The bacteria were collected and resuspended in PBS with 10 mg/ml 
lysozyme and protease inhibitors (10 1M E-64, 10 uM leupeptin and 1M 
pepstatin A) and lysed by freeze-thaw cycling and sonication. Protein 
purification from the supernatant was carried out using Ni-NTA agarose, 
followed by buffer exchange into 50 mM HEPES-KOH (pH =7.4) using a 
PD-10 column (GE Healthcare). Absorption and fluorescence spectra 
were measured using a spectrophotometer (U-3310, Hitachi) anda 
multimode microplate reader (Synergy Mx, BioTek), respectively. The 
molar extinction coefficient was calculated with protein concentrations 
measured using a Bradford protein assay kit (Bio-Rad), with BSA as the 
standard. Absolute fluorescence quantum yields were measured using 
an integrating sphere (C9920, Hamamatsu) with a multichannel ana- 
lyser (C10027, Hamamatsu). A pH titration experiment was performed 
using buffers containing 25 mM of acetate (pH 4.0, 4.5 or 5.0), MES (pH 
5.5, 6.0 or 6.5), HEPES (pH 7.0, 7.5 or 8.0) or borate (8.5, 9.0, 9.5 or 10.0). 


Imaging of bacterial colonies 

Time-lapse imaging of transformed £. coli colonies was carried out 
using our homemade fluorescence analysing system, which consists ofa 
Xenon light source (MAX-301, Asahi Spectra) and acooled CCD camera 
(CoolSNAP HQ, Photometrics) controlled by MetaMorph (Universal 
Imaging). The 480AF30 (Omega Optical) and PBO54.0/020 (Asahi Spec- 
tra) filters were used for excitation and emission, respectively. The same 
amount of competent JM109(DE3) cells was used for transformation 
with the pRSET,-Achilles and pRSET,-Venus genes. After 3 hincubation 
at 37 °C, the plate was placed in a stage-top incubation chamber (IBC, 
Tokai Hit) kept at37 °C and time-lapse imaging was immediately started. 
Images were analysed using ImageJ (National Institutes of Health) and 
the five-parameter sigmoidal curve (SigmaPlot (Systat Software)) gave 
the best-fit curve for the time-course data. 


Fluorescence measurement of synthesized proteins 

Achilles and Venus cDNAs were inserted into the BamHI and EcoRI sites 
of pCS2 with a partial Kozak sequence CCACCATGG. The plasmids were 
linearized with Notl and mRNAs were synthesized using an mMESSAGE 
mMACHINE SP6 kit (Ambion). Protein synthesis was started by add- 
ing the synthesized mRNA to a cell-free protein-synthesizing system 
(PUREfrex 2.0, Gene Frontier). The reaction mixture was placed ina 
microplate reader (Synergy Mx, BioTek) at 37 °C and the fluorescence 
was monitored with excitation and emission wavelengths at 480 nm and 
530 nm, respectively. The five-parameter sigmoidal curve (SigmaPlot 
(Systat Software)) gave the best-fit curve for the time-course data. 


Generation of Hes7-Achilles reporter transgenic mice 

The reporter construct design is shown in Extended Data Fig. 3. Venus- 
Hes7and Achilles-Hes7 transgenes were generated as follows. The Xhol- 
Kozak-Venus-Hes7 fragment was amplified by PCR, and then inserted 
between the genomic fragment of the Hes7 promoter and the 3’ UTR, 
which were used in the pHes7-UbLuc transgene”. Transgenic mice were 
generated by injecting the linearized constructs without backbone 
sequences into the pronuclei of fertilized eggs of ICR mice. All mice 
were handled in accordance with the Kyoto University Guide for the 
Care and Use of Laboratory Animals. Genotyping was performed using 
the following primers: forward, 5’-CGACC ACTAC CAGCA GAACA-3’; 
reverse, 5’-ATCCT CACTC CTAGT CCACA GAG-3’. 


Explant culture 
Male mice carrying the Hes7-Achilles transgene were mated with wild- 
type ICR female mice, and then female mice at day 10 of pregnancy were 
killed. For live imaging that aimed at cell tracking and subsequent sin- 
gle-cell quantification, mice carrying the Hes7-Achilles transgene were 
crossed with mice of the ROSA26-H2B-mCherry line®. Embryos were 
dissected out in DMEM/F12 with 15 mM HEPES (Gibco) supplemented 
with 100 units per millilitre penicillin, 1OO pg/ml streptomycin (Nacalai 
Tesque) and 0.2% BSA (Sigma). Culture medium for whole PSM tissues 
consisted of DMEM/F12 (Cell Culture Technologies) plus 1%BSA, 2 mM 
L-glutamine (Gibco), 1 g/l glucose (Wako) and 15 mM HEPES (Nacalai 
Tesque). For whole PSM cultures, tail regions including PSM and 2 
or 3 formed somite pairs were embedded in 0.15% (for wide-field) or 
0.3% (for confocal) low-melting-point agarose (SeaPlaque GTG, FMC) 
diluted in culture medium. The gel was set in a silicon ring attached 
onto a35-mm glass-bottomed dish (14-mm diameter, Matsunami). 
Culture medium for tail buds was CO, 5%-equilibrated DMEM/F12 (Cell 
Culture Technologies) plus 1%BSA, 2mM L-glutamine, 0.1g/I glucose 
without HEPES, which was basically the same as has previously been 
established”°. For tail-bud culture, a glass-bottomed dish was coated 
with fibronectin 50 pg/ml (Sigma) diluted in PBS for 2h ona35-°C hot 
plate. Tail-bud regions were excised and put onto a fibronectin-coated 
glass-bottomed dish with the anterior side down. Whole PSM tissues 
and tail-bud explants were maintained in a humidified chamber at 37 °C 
in5%CO,and 80% 0,, or in5% CO,, respectively. To perturb Notch signal- 
ling, a5 uM DAPT treatment or acute knockdown of Lfng was performed. 
For acute knockdown of Lfng, two short hairpin RNAs (shRNAs) 
targeting mouse Lfng MRNA (Lfng shRNA no. 1: GCATAGCCTCTC- 
CGAGTACTT TCAAGAGAAGTACTCGGAGAGGCT ATGCTTTT; Ling 
shRNA no. 2: CCCCTGAGCTATGGCATGTT TGAGAATCAAGAGT TCTC 
AAACATGCCATAGCTCAGGGTTTT) and scrambled shRNA (GCCCGT- 
TATCGCAC TGATTCATCAAGAGTGAATCAGTGCGATAACGGGCTTTT) 
were designed and inserted downstream of human U6 promoter. 
pPGK-iRFP670-NLS expression cassette was also attached to monitor 
transfected cells. For electroporation and subsequent imaging, tail-bud 
tissues from embryos carrying Hes7-Achilles transgene and Rosa26- 
H2B-mCherry allele, at embryonic day (E)10, were used and cultured 
following a previously established explant culture method”. Tail-bud 
mesenchyme cells were isolated, placed into an electrode chamber 
(CUY505P5, NEPAGENE) filled with 1 pg/ml shRNA-expression plasmid 
diluted with Opti-MEM (Thermo Fisher Scentific) and then incubated 
for 10 min at room temperature. Two successive poring pulses of 100 V 
for 5ms, and 5 successive transfer pulses of 20 V for 50 ms, were applied 
using NEPA21 Super Electroporator (NEPAGENE). Tissues were then 
transferred ontoa fibronectin-coated glass-bottomed dish. Time-lapse 
imaging was started after 6 h of incubation at 37 °C in 5% CO,. 


Live imaging 

Confocal imaging was performed ona Zeiss LSM780 upright (for whole 
PSM culture), or inverted (for tail-bud culture) laser-scanning micro- 
scope. A 20x water immersion lens and a 40x oil immersion lens were 


used for whole PSM culture and tail-bud culture, respectively. Achilles 
was excited with a 514-nm Argon laser. Additionally, for multicolour 
imaging aimed at cell tracking, mCherry was excited with a 561-nm 
diode-pumped solid-state laser. A Z-stack of 20-30 images was taken 
with 2-3-um depth intervals every 180 s (for whole PSM) or 90s (for tail 
bud). Multicolour imaging was performed by simultaneous excitation 
using a 514/561-nm laser with 458/514/561/633-nm main beam split- 
ter. Wide-field live imaging was performed either onan Olympus 1X81 
equipped witha cooled CCD camera (Princeton Instruments, VersArray 
1kb) or an Olympus [X83 equipped with an iKon-M (Andor) CCD cam- 
era. Signals from samples were collected by an Olympus (Tokyo) x10 
UPlanApo objective. For bioluminescence imaging, 1 mM D-luciferin 
(Nacalai Tesque) was added to culture medium. Signal-to-noise ratios 
were increased by 4 x 4 binning and 3-min exposure. 


Image processing, cell-tracking and signal quantification 

For confocal images, the mCherry channel was used for cell tracking and 
signal normalization. Raw images were smoothed by Savitzky—Golay 
temporal filter with 5-frame window size and subjected to tracking 
by TrackMate™ in Fiji/Image). Parameters such as mean intensity and 
position inx-y-z directions for each cell at each time frame were taken 
from a 6-um-diameter circle at the centre of each cell. Further signal 
analysis was performed with custom-made programs in Matlab. Mean 
intensity in the Achilles channel was divided by mCherry intensity for 
normalization. To detrend time-series data, a trend line was drawn by 
taking the moving average of the signal with a window size of 240 min 
and then subtracted from the normalized signal. Savitzky-Golay filter- 
ing with third order and window size 60-80 min was applied to smooth 
the signal. Hilbert transform was performed to obtain instantaneous 
oscillation phase. Period and amplitude were quantified by peak detec- 
tion on detrended and smoothed intensity. The definition of amplitude 
was the sameas previously described®. For bioluminescence imaging, 
spike noise induced by cosmic rays was removed. The spatiotemporal 
pattern was obtained by averaging the signal along the left-to-right 
axis for each time point, and was then aligned in temporal sequence. 


Quantification of synchronization and statistical analysis 

To evaluate whether a population of oscillators were synchronized, 
we applied the Rayleigh test to the phase distributions constructed 
from the single-cell traces of the phase information, as previously 
described”. Oscillation dynamics of population averages were quan- 
tified by taking the average signal in the whole area, and processing this 
signal in the same way as for the single-cell data to obtain the instan- 
taneous phase. Relative phase shift from the collective oscillation for 
each cell was quantified by calculating the phase difference between 
the phases of neighbouring cells and the single-cell phase. To com- 
pare the synchronization efficiency, the Kuramoto order parameter 
was determined as previously described”. The order parameter was 
calculated using the relative phase shift. The anisotropy of phase data 
was assessed by Rayleigh test. 


Mixture experiments 

A posterior half of the PSM was dissociated mechanically by pipetting 
up to 30 times, filtered through 10-"m-pore cell strainer, and seeded 
into a silicon ring with 1.5-mm diameter and 2-mm height set ina glass- 
bottomed dish coated with fibronectin. Majority cells expressing 
Achilles-HES7 and minority cells expressing both Achilles-HES7 and 
H2B-mCherry were mixed at a 20:1 ratio. Cells were maintained in the 
culture medium used in tail-bud culture, plus 10 pM Y-27632 (Wako). The 
oscillation phases in minority and majority cells were quantified by Hes7- 
Achilles reporter signal inthe mCherry* or mCherry area, respectively. 


Single-cell isolation culture 
We followed previously described methods”, with some minor modifi- 
cations. Tail-bud regions were treated in Accutase (Nacalai Tesque) for 


5minona35-°C hot plate, and ectodermal tissues were removed using 
a tungsten needle. Explant tissue was cultured on fibronectin-coated 
chamber cover glass (Laboratory-Tek) for 24 hin explant medium con- 
sisting of DMEM 4.5 g/I Glucose (Thermo Fisher no. 31053) plus 15% FCS 
(ES-cell-screened, Hyclone), 2mML-glutamine (Gibco), 100 U penicillin, 
100 mg/ml streptomycin (Nacalai Tesque), 1x non-essential amino acids 
(Gibco), 10 mM HEPES (Nacalai Tesque), 0.1mM of B-mercaptoethanol 
(Gibco), 3 uM Chir-99021 (Sigma no.SML1046), 200 nM LDN-193189 
(StemRD no. LDN-02), 2.5 1M BMS-493, 50 ng/ml mFGF4 (R&D), 1mg/ 
ml heparin (Sigma) and 10 iM Y-27632 (Wako). Explant tissue was then 
detached using a P20 tip, collected ina1.5-ml tube and dissociated by 
pipetting, filtered through a10-um cell strainer, seeded onto 1% BSA- 
coated chamber cover glass and maintained in explant medium plus 
0.5 uM latrunculin A (Wako no. 125-04363). 


C2C12 sender-receiver assay 

C2C12 cells with a light-inducible Dil (sender) and pHes1-NLS-UbLuc 
reporter (receiver) have previously been established”. Various sender- 
receiver lines were newly established by introducing constructs with Lfng 
expression cassettes into the original sender or receiver line. All plasmids 
were based onthe Tol2 transposon vector system (a gift fromthe Kawakami 
Laboratory). To establish stable cell lines, 0.5 ug pCAGGS-mT2TP, 0.125 ng 
pKYK34-pEFs-Puro and 0.375 1g pKYK28-pPGK-DII3-HA-pPGK-iRFP670- 
NLS or pKYK29-pPGK-DII3-HA-pPGK-iRFP670-NLS-pPGK-Lfng-Flag was 
transfected into original sender or receiver line, cultured in a12-well plate 
at 5x10‘ cell density using ViaFect transfection reagent (Promega). Cells 
were expanded and selected by 2 pg/ml puromycin for one week. iRFP670* 
cells were then sorted using FACSAria III (BD Biosciences). Then, 1.25 x 
10° of sender cells and 0.25 x 10° of receiver cells were mixed and plated 
onto black 24-well plates, and photon-counting measurements were per- 
formed every3 min with 5-s blue-light exposure. Light stimuli were applied 
every 2.5 h with 30-s duration. Recorded traces were detrended and then 
smoothened using a Savitzky—Golay filter. 


Time-lapse imaging of DLL1-Luc2 fusion protein in C2C12 cells 
C2C12 cells that carry the light-inducible DLL1-Luc2 fusion protein 
system and the Di/3 and Golgi-mCherry-2a-mem-iRFP670 expression 
system, with or without the L{ng expression vector, were established, 
and the luciferase activity in iRFP* mCherry regions was quantified. 


Culture of PSM-like tissue derived from ES cells, and chemical 
library screening 

PSM-like tissues (iPSM colonies) were induced from mouse ES cells that 
carry the Hes7-UbLuc reporter, as previously described”’. A single iPSM 
colony per well was cultured in gelatin-coated, black 24-well plates, and 
each small compound was added from day 4 onward. Hes7-promoter- 
driven luciferase activity was measured by a highly sensitive photomul- 
tiplier tube**. Small compounds that lengthened the period of Hes7 
oscillations (Supplementary Table 1) were chosen for further analyses. 


Mathematical modelling 

The HES7 level of cell iis described by X,(0 (in whichi=1, 2,...,36andt 
is time in hours). Here, 7, is the time required for Hes7 to affect its own 
formation inthe same cell through negative feedback. The interaction 
between cells is simplified in the following manner. DI/1 is inhibited by 
Hes7 inthe same cell, and activates Hes7 in other cells. We regard this 
interaction as the mutual inhibition between two cells with delay rT, in 
Hes7 dynamics (Extended Data Fig. 9b). Thus, Tt, represents the time 
required for Hes7 from one cell to repress Hes7 in its neighbouring 
cell. In dynamical equations of the model (Extended Data Fig. 9c), the 
interpretations of parameters areas follows: vis the maximum synthe- 
sis rate; ris the degradation rate; K, and K, correspond to the typical 
amounts of HES7 that account for the repression; and mand nare the 
Hill coefficients. N(i) represents the set of cells that neighbour cell i.In 
numerical simulations, we set v=10, r=2, K,=1, K,=2, m=2,n=2and 
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T= 0.75, and observed the dependence of dynamical behaviour on T,. 
Thesame random initial condition was used for all cases. In parameter 
space for in-phase oscillation, tT, values longer or shorter than 1.0 result 
insmaller amplitudes and larger phase differences. The t, dependence 
of oscillation amplitude (X,,,,) and dispersion among cells (Xqi,) are 
defined as follows. The oscillation amplitude X,,,,(é) of cell iis defined 
as the difference between the maximum and minimum X,(0 values for 
t,<t<t, in which ¢,=100 and ¢, = 200. Xn, is their average: 


1 36 
Xamp = 36 2 Kamp) 
i= 


Xi, is the standard deviation of X,(t) - X(t) for t,<t<t,: 


Xais =) 


12% 4 ty ees 
ae J, oo X (ode 
X sis Should be compared with X,,,): a smaller X4/Xamp Value indicates 
a better synchronization. The oscillation amplitude X,,)(i) of cell iis 
defined as the difference between the maximum and minimum X,(0) 
values for t,<¢<t,,in which ¢,= 100 and t, = 200. X,m, is their average: 


1 36 
Xamp = 36 > Xamp(é) 
i 


=1 


For atime series of X,(¢) at different T, values, the average HES7 level 
(Extended Data Fig. 9d) is calculated as: 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The nucleotide sequence for Achilles cDNA has been deposited in the 
DDBJ/EMBL/GenBank under the accession number LC381432. Raw data 


for Achilles and all the other experiments are available on request from 
A.M. and the corresponding author, respectively. Correspondence and 
requests for materials should be addressed to A.M. (matsushi@brain. 
riken.jp) for Achilles cDNA and R.K. (rkageyam@infront.kyoto-u.ac.jp) 
for other materials. 


Code availability 


Image processing and analysis were performed using Fiji (v.1.0) and 
Matlab (R2018a). Subsequent analysis was performed using custom 
Matlab scripts. The codes are available upon request from the cor- 
responding author. 
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Extended Data Fig. 1| Loss of Lfng affects Hes7 oscillation dynamics ata 
tissue level. a, pHes7-UbLuc imaging in wild-type and Lfng-knockout PSM. 
Spatiotemporal patterns along the anterior—posterior axis are shown. Top is 
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anterior. b, Period of Hes7 oscillations in the anterior and posterior PSM (n=4 
PSM samples). c, Amplitude of Hes7 oscillations (n=4 PSM samples). Error bars 
indicate s.e.m.*P<0.05, unpaired f-test. 
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Extended Data Fig. 2| Comparative characterization of Achilles versus 
Venus. a, Absorption (abs) spectra of Achilles (red) and Venus (black). 

b, Fluorescence images of bacteria that express Achilles and Venus. Bacterial 
colonies were grown at 37 °C and photographed at 8, 12, and 20 hafter 
transformation. Exactly the same number of competent bacterial cells was 


used for transformation. Scale bar, 5mm. c, Time course of fluorescence 
intensities of transformed E£. coli colonies (mean values +s.e.m. from three 
experiments). The data were normalized to the final yields extrapolated by 
curve fitting (broken line). d, Comparison of properties of Achilles and Venus. 
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Extended Data Fig. 3 | Schematic structures of fluorescent reporters for construct shown incto enable the transcripts to mimic endogenous MRNA 
HES7. a, Venus was inserted between the 5-kb Hes7 promoter andtheHes7gene _ stability.e, The Hes7 gene (exons + introns) without an initiation codon was 
to drive expression of the Venus-HES7 fusion protein. b, Achilles was inserted inserted between the PEST sequence and the Hes73’ UTR of the construct 
between the 5-kb Hes7 promoter and the Hes7 gene to drive expression of the showninc.f, Achilles fused to NLS-hCL1-hPEST is expressed under the control 
Achilles-HES7 fusion protein. c, Achilles fused to NLS-hPEST is expressed of the Hes7 promoter. g, Hes7 cDNA without an initiation codon was inserted 
under the control of the Hes7 promoter. d, Hes7 cDNA without an initiation between the PEST sequence and the Hes73’ UTR of the construct showninf. 
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Extended Data Fig. 4 | The Achilles-HES7 fusion protein is functional in abnormal vertebra and rib formation seen in the Hes7-null background. 
segment formation. a, Bone and cartilage were stained with Alizarin red and b, Higher magnification of the thoracic-to-lumbar area in Hes7-Achilles 
Alcian blue, respectively, at post-natal day (P)O. Achilles-HES7 rescued the transgene’, Hes7-null mouse ina. Scale bars, 5mm. 
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Extended Data Fig. 5| Observation of oscillation dynamics at the single-cell phase quantification. Fluorescence time series from acell extracted by 

level to analyse the phase-coupling mechanism. a, Live imaging (wide-field) tracking was converted into phase information using Hilbert transform. 

of PSM carrying the Hes7-Achilles reporter at E10.5. b, Spatiotemporal e, HES7 oscillation phase, colour-mapped onto the original image. Scale bars, 
expression pattern of signals from the Hes7-Achilles reporter in the PSM (wide- 100 pm. 

field).c, Arepresentative cell tracked by Fijiand TrackMate. d, Arepresentative 
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Extended Data Fig. 6| Synchronization of HES7 oscillation in tail-bud tissue 
cultures. a, Expression of Hes7-Achilles reporter in wild-type and Lfng- 
knockout tail-bud tissue cultures. Scale bar, 100 pm. b, Mean intensity of Hes7- 
Achilles reporter fluorescence in the whole area. c. Examples of time series of 
Hes7-Achilles reporter intensity from single-cell tracking data. d, e, Average 
period (d) and amplitude (e) of HES7 oscillations at a single-cell level. More than 
30 cells for each genotype (control and two independent reporter lines) were 
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examined. n, number of peak pairs used for quantification. Error bars indicate 
s.e.m.*P< 0.05, unpaired ¢-test. f, Distribution of phase in single cells at the 
timing of peaks, in mean intensity time series in tail-bud cultures. Control and 
two independent reporter lines were examined. The number of cells examined 
(n) is indicated. ***P< 0.001, Rayleigh test. g, Kuramoto order parameter 
calculated using Achilles-HES7oscillation phase quantified inf. Error bars 
indicate s.e.m.*P<0.05, unpaired t-test. 
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Extended Data Fig. 7 | Acute inhibitor or knockdown treatment of tail-bud reporter wild-type tail-bud tissue cultures treated with scrambled shRNA 
and dissociated PSM-cell cultures. a—c, Expression of the Hes7-Achilles (shScramble) (grey bars) or two different shRNAs against Lfng (shLfng-1 and 
reporter in wild-type tail-bud tissue cultures treated with DMSO control (grey shLfng-2) (blue bars). Synchrony (e) and Kuramoto order parameter (f, t= 600- 
bars) or the Notch inhibitor DAPT (red bars). Period (a), amplitude (b) and 900 min) of HES7 oscillations were quantified. The number of cells examined 
synchrony (c) of HES7 oscillations were quantified. Error bars indicate s.e.m. (n) is indicated. ****P< 0.0001, Rayleigh test (e). Error bars indicate s.e.m. 
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Extended Data Fig. 8 | Mixed cultures of wild-type PSM cells and PSM 

cells carrying a faster Hes7 oscillator. Wild-type (period = 126.6 + 2.0 min) 

and mutant (In(3)) PSM cells that carry a faster Hes7 oscillator 

(period =115.4+1.1 min)” were mixed asa minority in mutant or wild-type cells 
at 1:20 ratio, and fluorescence in the minority and majority cells was quantified 
over time. a, A small ratio (1:20) of In(3) cells were mixed into an In(3) 
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population. b, A small ratio (1:20) of In(3) cells were mixed into a wild-type 
population. c, Asmall ratio (1:20) of wild-type cells were mixed into an In(3) 
population. The distribution of phase difference between the minority cells 
and their neighbouring cells was calculated at each time point. At least 100 cells 
were examined for each genotype. ****P< 0.0001, Rayleigh test. 
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line is the average HES7 level (see ‘Mathematical modelling’ in Methods). Inthe 
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Extended Data Fig. 10 | K Y02111 partially rescued the amplitude and 
synchrony of HES7 oscillations in Lfng-knockout PSM cells. a, Effect of WNT- 
signalling-related chemical compounds on DLL1-Notch signalling delay was 
examined by asender-receiver assay in C2C12 cells. Representative time series 
of the HesI reporter signal in receiver cells after light induction of Dil inthe 
presence of DMSO, KY02111, kenpaullone or norcantharidin are shown. b, Peak 
timings of the Hes1 reporter after blue-light stimulation. n >10 measurements 
for each condition. c, Fold change of amplitude of the Hes/ reporter after blue- 
light stimulation. n >10 measurements for each condition. Error bars indicate 
s.e.m. *P< 0.05, unpaired t-test. d, Quantification of Hes7-Achilles reporter 
signals in central area (containing posterior PSM identity) of wild-type and 


Lfng-knockout tail-bud cultures in the presence of 0.1% DMSO (control), 
KY02111, kenpaullone or norcantharidin. e, Distribution of phase in single cells 
at the timing of peaks in mean intensity time series, inLfng-knockout tail-bud 
cultures inthe presence of DMSO (control) or KYO2111. The number of cells 
examined (n) is indicated. ***P< 0.001, ****P< 0.0001, Rayleigh test. f, Average 
amplitude of HES7 oscillations in Lfng-knockout tail-bud cultures in the 
presence of DMSO (control) or KYO2111. Error bars indicate s.e.m.*P< 0.05, 
unpaired ¢-test. g, Kuramoto order parameter calculated using Achilles-HES7 
oscillation phase quantified ine. Error bars indicate s.e.m. *P< 0.05, unpaired 
t-test. 
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Pluripotent stem cells are increasingly used to model different aspects of 
embryogenesis and organ formation’. Despite recent advances in in vitro induction of 
major mesodermal lineages and cell types”, experimental model systems that can 
recapitulate more complex features of human mesoderm development and 
patterning are largely missing. Here we used induced pluripotent stem cells for the 
stepwise in vitro induction of presomitic mesoderm and its derivatives to model 
distinct aspects of human somitogenesis. We focused initially on modelling the 
human segmentation clock, a major biological concept believed to underlie the 
rhythmic and controlled emergence of somites, which give rise to the segmental 
pattern of the vertebrate axial skeleton. We observed oscillatory expression of core 
segmentation clock genes, including HES7 and DKK1, determined the period of the 
human segmentation clock to be around five hours, and demonstrated the presence 
of dynamic travelling-wave-like gene expression in in vitro-induced human presomitic 
mesoderm. Furthermore, we identified and compared oscillatory genes in human and 
mouse presomitic mesoderm derived from pluripotent stem cells, which revealed 
species-specific and shared molecular components and pathways associated with the 
putative mouse and human segmentation clocks. Using CRISPR-Cas9-based genome 
editing technology, we then targeted genes for which mutations in patients with 
segmentation defects of the vertebrae, such as spondylocostal dysostosis, have been 
reported (HES7, LFNG, DLL3 and MESP2). Subsequent analysis of patient-like and 
patient-derived induced pluripotent stem cells revealed gene-specific alterations in 
oscillation, synchronization or differentiation properties. Our findings provide 
insights into the human segmentation clock as well as diseases associated with human 
axial skeletogenesis. 


We initially aimed to mimic and recreate in vitro the signalling events 
responsible for the stepwise emergence of presomitic mesoderm 
(PSM) and its derivatives during embryonic development, as also 
recently attempted by others”**, via selective activation and inhibition 
of appropriate signalling pathways, using human induced pluripotent 
stem cells (iPS cells) as the starting material (Fig. la). We characterized 
the ability of our in vitro-induced human PSM cells to differentiate into 
somitic mesoderm and its two main derivatives: sclerotome, which 
gives rise to bone and cartilage of the axial skeleton, and dermomy- 
otome, which gives rise to skeletal muscle and dermis of the emerging 


embryo. RNA-sequencing (RNA-seq) analysis and subsequent char- 
acterization of in vitro-derived human PSM samples revealed that 
at each step of our induction and differentiation protocol, markers 
expected to be present—on the basis of either embryological studies 
in animal models or recent reports using stem cells?* °—were robustly 
and appropriately expressed at both transcript and protein levels 
(Fig. 1b-f, Extended Data Fig. 1, Supplementary Table 1), indicating 
that our stepwise approach follows the developmental trajectory and 
recapitulates ontogeny seen during embryonic somitic mesoderm 
development. 
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Fig. 1| Molecular and functional analysis of human PSC-derived PSM. 

a, Schematic overview of stepwise induction and differentiation of PSM from 
human PSCs. The ‘i’ suffix indicates inhibition of a pathway. DM, 
dermomyotome; PS, primitive streak; PSM, presomitic mesoderm; SCL, 
sclerotome; SM, somitic mesoderm. b, RNA-seq data of stepwise-induced PSM 
and derivatives. Fragments per kilobase of transcript per million mapped reads 
(FPKM) values for each gene were normalized to mean of all samples (n=3). 

c, Principal component analysis (PCA) plot with suggested developmental 
trajectory highlighted in pink. (n =3 experiments (Ex1-Ex3)). d, Representative 
flow cytometric evaluation of DLL1 and TBX6 protein expression iniPS cells and 
PSM (n=3 experiments). e, Expression of 7BX6 transcript during different 
stages of human PSM induction and differentiation. Quantitative PCR with 


Oscillations in human in vitro PSM 


We detected expression of TBX6 and DLLI, two well-established 
markers of PSM’, at both transcript and protein level in our in vitro- 
derived human PSM samples (Fig. 1d-f, Extended Data Fig. 1d-f). We 
also detected specific and high-level expression of HES7—a known 
regulator of the segmentation clock in murine PSM*—in the human 
iPS cell-derived PSM (Fig. 1b, Extended Data Fig. Ic). On the basis of 
these observations, we generated a luciferase-reporter iPS cell line 
for human HES7-promoter activity (HES7 reporter). We observed clear 
oscillation of the HES7 reporter in induced human PSM in 2D culture 
(Fig. 1g) and determined the period of the in vitro human segmentation 
clock to be around five hours (Fig. 1g, Extended Data Fig. 2), whichis 
similar to the four- to six-hour period reported for somite formation 
in primary human embryo samples’” and oscillation in human mes- 
enchymal cells”. 

We then investigated whether it was possible to observe travelling- 
wave-like expression, which is caused by synchronization among oscil- 
lations in neighbouring cells. This has been reported in the context 
of explant studies using reporter mice and mouse embryonic stem 
cell-derived PSM”, but, to our knowledge, has never been observed 
inhuman PSM. We induced PSM fate ina 3D culture of human iPS cells 
and allowed spheroids to spread on a culture dish. In the spreading 


reverse transcription (RT-qPCR) results of four independent experiments with 
the 201B7 cell line are shown; mean +s.d.,n=4.f, Representative expression of 
TBX6 protein in human PSM. Left, entire well. Right, enlarged view of bound 
area.n=3 experiments. Scale bar, 100 um. g, Oscillation of HES7 reporter 
activity in 2D culture of induced PSM. Signal was normalized to maximum 
oscillation peak. Period was calculated as average peak-to-peak interval using 
Ist to 5th peaks. n=3.h, Synchronization of HES7 reporter activity in spreading 
PSM spheroid. Left, PSM spheroid attached to dish 9 h after the start of the 
experiment. Right, kymograph along the yellow line shown in left, bottom. n=3 
experiments. Scale bar, 500 um. Data, images and graphs shown inb-d,fandh 
are representative of three independent experiments. 


PSM spheroid, we observed sustained oscillation and the clear pres- 
ence of travelling waves (Supplementary Video 1), also indicated by 
the tilted slope in corresponding kymographs (Fig. 1h, Extended Data 
Fig. 2c). The periods of the in vitro human segmentation clock did not 
differ between the two different assay conditions (2D oscillation versus 
3D synchronization assay) but remained stable at around five hours 
(Extended Data Fig. 2a, b). 


Derivatives of human in vitro PSM 

To ensure that our in vitro-derived PSM is comparable to its in vivo 
counterpart, we assessed its capacity to differentiate to somitic meso- 
derm, sclerotome and dermomyotome. To induce somitic mesoderm, 
we mimicked the decrease in FGF and Wnt activity along the posterior— 
anterior axis of the PSM, as reported in the embryonic context", by 
simultaneous inhibition of both pathways, leading to rapid and robust 
induction of somitic mesoderm expressing 7CF15, a well-established 
marker of somite development”, at both transcript (Fig. 1b, Extended 
Data Fig. Ic) and protein level (Extended Data Fig. 3a, b). MESP2, a 
marker of segmentation, showed weak expression in induced human 
somitic mesoderm (Extended Data Fig. 3b). Even though our system 
exhibited robust differentiation towards somitic mesoderm, seg- 
mentation or formation of somite-like structures were not observed. 
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Dermomyotome and sclerotome cells derived from in vitro-induced 
human somitic mesoderm expressed appropriate developmental stage- 
specific markers such as PAX7 and PRRX1 (dermomyotome) or FOXC2 
(sclerotome) at the transcript and protein level (Fig. 1b, Extended Data 
Figs. 1c, 3c). Sclerotome derived from induced human somitic meso- 
derm also differentiated into bone and cartilage, and demonstrated 
endochondral bone formation upon transplantation in vivo, albeit 
lacking any apparent macroscopic segmental or vertebral patterns” 
(Extended Data Fig. 4). Further, in vitro-derived human dermomyotome 
cells displayed robust in vitro induction to skeletal muscle cells accom- 
panied by expression of muscle markers at the protein level (Extended 
Data Fig. 5a, b). Functional analysis of induced human sclerotome and 
dermomyotome cells derived from a calcium-reporter iPS cell line 
(Gen1C) revealed the reproducible presence of contracting skeletal 
muscle cells and contractile bundles after three weeks of in vitro 2D 
differentiation culture of dermomyotome cells (Extended Data Fig. 5c, 
d, Supplementary Video 2). We thus showed that our in vitro-induced 
human PSM-like cells can be robustly differentiated further and give 
rise to functional somitic mesoderm derivatives including sclerotome 
and dermomyotome. 


Human and mouse in vitro segmentation clocks 


Next, we investigated whether we could use this experimental system 
to increase our understanding of the human segmentation clock and of 
oscillation and synchronization in human PSM. We collected samples 
of human induced PSM during oscillation by monitoring the oscilla- 
tory activity of the HES7 reporter and performed RNA-seq analysis 
(Extended Data Fig. 6a). Next-generation sequencing of the different 
PSM time points revealed a core set of about two hundred oscillating 
genes in human in vitro PSM (Fig. 2a, Supplementary Table 2). Path- 
way and Gene Ontology (GO) analysis of the identified gene clusters 
revealed that, in addition to enrichment of pathway members previ- 
ously associated with the segmentation clock, such as Notch, Wnt or 
FGF signalling”, novel pathways were also represented in our data- 
set, including oscillating genes associated with TGF-B, PI3K, ephrin, 
histone deacetylase and Hippo signalling (Fig. 2b, Supplementary 
Table 3). Consistent with previous reports in mouse PSM, treatment 
with inhibitors of Notch, FGF or Wnt pathways decreased the intensity 
of the HES7 reporter in human in vitro-derived PSM (Extended Data 
Fig. 6b, c, Supplementary Video 3). Of note, we observed oscillatory 
activity of core components of the Hippo pathway and YAP signalling 
(TEAD4 and AMOTL2) (Fig. 2a-c), which were recently reported to be 
important regulators of oscillatory activity in mouse PSM’ (see also 
Supplementary Discussion 1). 

To identify putative human-specific and evolutionary conserved 
components of the in vitro segmentation clock, we then analysed and 
compared our data with mouse epiblast stem cell (EpiSC)-derived 
in vitro mouse PSM, applying a similar strategy for induction and analy- 
sis as for the human cells. Mouse EpiSC-derived PSM showed asegmen- 
tation clock period of two to three hours, confirming earlier in vivo and 
in vitro mouse studies”, as well as indicating that the obtained human 
data are reflective of the in vivo condition. Next-generation sequencing 
analysis of mouse PSM time points revealed a set of about 170 oscillating 
genes in mouse in vitro-derived PSM, including both novel and previ- 
ously reported oscillating components of the mouse segmentation 
clock”° (Extended Data Fig. 7a, b, Supplementary Table 4). 

GO-term and pathway analysis of the identified oscillating mouse 
genes revealed that major pathways identified inthe human model were 
also presentin the mouse model (Fig. 2b, Supplementary Table 5), with 
some species-specific differences in individual oscillating members of 
the same pathways (see also Supplementary Discussion 2). Comparison 
ofthe human and mouse oscillating gene sets further revealed the pres- 
ence of genes oscillating in phase with HES7, including the circadian 
clock gene PER1, which to our knowledge, has not been linked to the 
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Fig. 2 | Identification of phase and antiphase oscillating genes of in vitro 
human and mouse segmentation clocks. a, Heat map of normalized gene 
expression levels for oscillating genes in human in vitro-derived PSM. Data 
shown for two independent biological datasets with 16 samples each. 
Identified phase- and antiphase-oscillating genes are highlighted on the right; 
unambiguously phase- or antiphase-oscillating genes highlighted on the left; 
solid and dotted black lines indicate unambiguous and ambiguous genes, 
respectively. b, Ingenuity pathway analysis (IPA) result of human phase (yellow) 
and antiphase (blue) oscillating genes. c, Validation of RNA-seq results by 
RT-qPCR for selected phase- and antiphase- oscillating genes with specific 
oscillatory expression in human iPS cell-derived PSM but not mouse EpiSC- 
derived PSM. Data shown for two independent biological datasets with 16 
samples each. See also Extended Data Fig. 6d.d, RT-qPCR validation of phase- 
and antiphase-oscillating genes found to oscillate in mouse EpiSC-derived 
PSM. The same genes show oscillation in human PSC-derived PSM (see also 
Extended Data Fig. 6e, f). Datainc and d represent mean values of three 
technical replicates of two independent biological experiments. 


segmentation clock (Extended Data Fig. 6e, f), as wellas genes showing 
clear antiphase oscillatory expression (for example, DKK1). The phase 
cluster of human and mouse oscillating genes contained genes associ- 
ated with the Notch pathway, such as LFNG”, whereas the antiphase 
cluster contained negative-feedback regulators associated with the Wnt 
pathway, such as DKK1 and SP5*” (Fig. 2a, b, Extended Data Figs. 6f, 
7a), as previously reported for posterior PSM of mouse embryos”. 
Generating a dual luciferase-activity-based reporter cell line for DKK1 
and HES7 promoter activities, we confirmed clear phase and antiphase 
reporter oscillations in human iPS cell-derived PSM (Extended Data 
Fig. 7d), suggesting that our induced PSM may represent posterior 
immature PSM rather than anterior mature PSM. 


Analysis of knockout human in vitro PSMs 

AS our assay systems were capable of assessing both oscillation and 
synchronization (Extended Data Fig. 7e) as well as identifying molecular 
features of the segmentation clock in induced PSM (Fig. 2, Extended 
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Fig. 3 | Functional evaluation of targeted disruption of selected 
segmentation clock genes in human in vitro PSM. a, Two-dimensional 
oscillation assay for wild-type (WT) and HES7-knockout (KO) PSMs. Signal was 
normalized to the maximum oscillation peak. b, Two-dimensional oscillation 
assay for LFNG-, DLL3- and MESP2-knockout PSMs. c, Three-dimensional 
synchronization (spheroid-spreading) assay for knockout PSMs. Kymograph 
along the yellow line in Supplementary Video 4 is shown. Scale bar, 500 pm. 
d, Damping rate of oscillation amplitude in knockout PSMs. The detrended 
signal shown in Extended Data Fig. 8b was normalized to the first oscillation 
peak, and the value of each peak is shown. Data are mean +s.d.,n=3.Pvalues 
are from two-sided Dunnett's test. *P< 0.05, **P< 0.01, ***P< 0.001; NS, not 
significant. Data, graphs and images shown in a-care representative of three 
independent experiments. 


Data Figs. 6, 7), we next investigated whether they could be used to 
model anomalies of human axial skeletogenesis, such as segmentation 
defects of the vertebrae (SDV), which are known to be caused by muta- 
tions in genes associated with the segmentation clock (for example, 
HES7,LFNG, DLL3 and MESP2)*°*>*”, We used CRISPR-Cas9 technology 
to generate knockout reporter iPS cell lines with frameshifts or deletion 
mutations in these target genes (Extended Data Fig. 8a) and analysed 
their putative loss-of-function effect on oscillatory HES7 reporter activ- 
ity. Knockout of endogenous HES7 itself led to clear loss of oscillatory 
activity of the HES7 reporter inthe 2D oscillation assay (Fig. 3a), similar 
to previous embryological studies using knockout mice®. Knockout 
reporter cell lines for LENG, DLL3 or MESP2 continued to show strong 
oscillatory HES7 activity (Fig. 3b), even though knockout mice for LFNG 
and DLL3 have been reported to show defective oscillation patterns*”’. 
We reasoned thatin our 2D oscillation assay, the phase (that is, timing) 
of oscillations is initially reset by medium change, resulting in collective 
oscillation even in the absence ofa strong synchronization mechanism. 
We then examined the synchronization ability of knockout reporter 
celllines for the above genes using the 3D synchronization (spheroid- 
spreading) assay of human induced PSM (Fig. 3c). The healthy control 
(wild type) and the knockout reporter cell line for MESP2 produced 
sustained oscillations and occasional travelling waves (Supplementary 
Video 4), indicating intact synchronization among neighbouring cells. 


By contrast, in lines with LFNG or DLL3 knockout, oscillation damped 
quickly and clear travelling waves were not observed (Fig. 3c, d, 
Extended Data Fig. 8b, Supplementary Video 4). We interpreted this 
rapid loss of oscillatory activity as a sign of diminished synchronization. 
Unlike the 2D oscillation assay, spheroids in the 3D synchronization 
assay spread dynamically on the culture dish, and cell movements 
desynchronized oscillation phases. Without a proper synchronization 
mechanism, collective oscillation was quickly lost, even though oscil- 
lations in individual cells continued. Thus our 2D and 3D assay systems 
using induced human PSM were able to detect defects in oscillation and 
synchronization, respectively (Extended Data Fig. 8c). 

Flow cytometric and transcriptome analysis showed no major dif- 
ferences between control cells and knockout reporter cell lines (HES7-, 
DLL3-, LFNG- and MESP2-knockout) at the iPS cell and PSM stages 
(Extended Data Fig. 8d, e). PSM-induction efficiency was high and com- 
parable to healthy control cells in HES7-, DLL3- and MESP2-knockout 
reporter cell lines, and slightly reduced in LFNG-knockout iPS cell lines 
(Extended Data Fig. 8d). There were few differences in gene expression 
at the iPS cell and PSM stages when comparing knockout cell lines 
with the original healthy donor line, with HES7, MESP2 and LFNG show- 
ing higher expression in HES7-knockout-derived PSM, as previously 
also shown in mice” (Extended Data Fig. 8e). Together, these results 
underline the overall value of a higher-order assay system that can 
assess gene or protein expression as well as more complex features 
suchas oscillation or synchronization in human in vitro-derived PSM, 
thus making it possible to decipher functionally relevant and possibly 
disease-associated features specific to each loss- or gain-of-function 
mutation, which would otherwise remain inaccessible. 


Analysis of patient-derived in vitro PSM 


To evaluate the utility of our model system to both assess key features 
of the human segmentation clock and address molecular mechanisms 
associated with human diseases affecting axial skeletogenesis, we gen- 
erated a HES7 reporter cell line with a point mutation (rs113994160: 
c.73C>T) causing a pathogenic missense mutation R25W in the helix— 
loop-helix domain of HES7, previously reported to cause spondylo- 
costal dysostosis (SCD) and SDV in humans”°. HES7**"-homozygous 
mutants were created in the HES7 reporter using single-stranded donor 
oligonucleotides (ssODN) templates (Fig. 4a, Extended Data Fig. 9a). In 
addition to this patient-like reporter cell line, we also derived iPS cells 
from patients showing clinical features of SCD, including segmentation 
defects along the entire spine and bilateral fusion of ribs (Extended 
Data Fig. 9, Supplementary Note 1). Following initial quality control 
and validation of the patient-derived iPS cell lines, named SCDP1 and 
SCDP2 (Extended Data Fig. 9b-f), we evaluated their in vitro-differ- 
entiation ability towards PSM together with the HES7*>"’ cell line. All 
three patient-like and patient-derived iPS cell lines showed high induc- 
tion efficiency towards PSM, as assessed by flow cytometric analysis 
of DLL1 expression (Fig. 4b, Extended Data Fig. 10a), indicating that 
their initial capacity to differentiate to PSM is not altered. We then 
performed 2D oscillation assays with all three cell lines and observed 
clear loss of oscillation for the HES7*” point-mutation line, similar 
to that observed for the HES7-knockout cell lines. Conversely, SCDP1 
and SCDP2 iPS cell-derived PSMs showed sustained oscillation, with 
SCDP1also showing sustained oscillation in 3D assay (Fig. 4c, Extended 
Data Fig. 10b, c). 

To expand our analysis of the patient-derived cell lines, we set out 
to determine putative underlying pathological mutations in both iPS 
cell lines. For the SCDP1 patient-derived cell line, we identified—via 
exome-sequencing—compound heterozygous variants in MESP2, 
c.258-261delCAGC (p.E88Gfs*31, rs1452984345) and c.307G>T (p.E103*, 
1871647808). The first variant results in a frameshift that produces a 
truncated protein in the middle of the DNA-binding domain, and the 
second variant has previously been reported as a pathogenic founder 


Nature | Vol580 | 2April 2020 | 127 


Article 


a HES7A25W iPScells b 


HES7 reporter ee 
¢.73C/c.73C ) 


HES7R25W 
€.73C>T/c.73C>T py )\ 4 )\ 


HES7F25”\ 
\ 


ie SCDP1-A’ \\ 
woe BB ue CasQ 
RRROD y J 


vi 
o 


iPS 


HES7 reporter activity 
o 


SM 


—e HES7R2W 
—s- SCDP1-A (AMESP2) 
—*- SCDP2-E (ADLL3) 


DLL1-APC 


0 500 


1,000 
Time (min) 


1,500 


Fig. 4|In vitro recapitulation and molecular 
analysis of disease-phenotypes using patient iPS 
cells and isogenic controls. a, Overview of ssODN- 
based targeting strategy for generation of a point 
mutation (HES7*?™) reporter cell line. b, Evaluation 
of DLL1-positive PSM-induction efficiency of 
HES7*5" and iPS cell lines from patients with SCD 
(SCDP1land SCDP2).n=3 experiments. c, Two- 
dimensional oscillation assay for HES7#", SCDP1-A 
and SCDP2-E PSM. Signal was normalized tothe 


d Day 3 = | for’ | © - MESP1 - MESP2 maximum oscillation peak. n=3 experiments. 
NE 3 304 g 150 d, Heat map of putative genes related to SCD at the 
g B 
$ 25 3 ~ 
NTA 3 204 $1004 somitic mesoderm stage. Genes were upregulated 
SAMD3 8 451 6 : 
RI7SLT 404 i & 504 i or downregulated in the SCDP1-A and SCDP1-F 
& 54 3 : : 3 
MYBPCI 3 0) ete | 2 0) a | patient cell lines and increases or decreases were 
fon gs FGF18 5 14 DUSP5 inhibited in rescued cell lines (SCDP1-resA1, F2 and 
En al F3). RNA-seq data for two different MESP2-knockout 
Pee % 154 z 5] celllines (no. 7 and 11) are included. e, RT-qPCR- 
MESP1 | ] Siac 
LINCO1280 2 te ils g 4] nl based validation of RNA-seq results. Data are 
x ac] y A . 
FINZSL2 0) wine sm 2 ol em mean+ts.d.,n=3.f, Three-dimensional 
FHAD1 izati i -E)- 
eae § 8 FGF4 § 10 EPHA3 synchronization assay of SCDP2 patient (SCDP2-E) 
ABCCS é 6} é 34 and isogenic rescue (SCDP2-resE17/43)-line-derived 
FGF18 5 4) s al PSMs.n=3 experiments. Kymographs of 
GADD45B so : ee 
Hig = am | We 2 2 iif Supplementary Video 5 are shown. g, Quantification 
FM © 0 EESEtY ee e 0 een ars ad of HES7 reporter activity in patient and rescue 
O52 Ont a sires : 
Z=Lona 88 =eyvQaads : ; 
OVOS2 ¥Oa 8 ¥Qa 8 PSMs. Value of each oscillation peak is shown 
ADAMTS15 B2eeaari BeVeaats 
SLIT2 s BOS 88 a8 SRS Bo a a calculated from Extended Data Fig. 12f. Data are 
NS Nsw F 
{Bal Exe Ee a F 12 ae “ss: 99 =s 38 mean¢+s.d.,n=3 experiments. Pvalues are from 
atient fescue : ; . 
WT (20187) gcppt) —(SCDPt-res) MESP2 ee ag ies ‘ 7 « AS. two-sided Dunnett’s test. Data, images and graphs 
z NS * ns | : : 
SCDP2-E (ADLL3) eGnpetecEid. éGpe57acEAs 8529 ANS fins NS gy otto. shown inb,c, eand fare representative of three 
> agit Whigs. 0.07 + tt independent experiments. 
a 
BSE ; a hs 
5 £58 1.0 
5 gt 
Se 
& 8805 
3 33 
2 2° 0 


Space 


Peak number 


GscpP2-E GSCDP2-resE17 EISCDP2-resE43 


variant of SCD ina Puerto Rican population® (Extended Data Fig. 10d, 
Supplementary Note 2). The clear oscillatory activity of HES7 in the 
oscillation assay of the SCDPI cell line harbouring MESP2 loss-of- 
function mutations (Fig. 4c) was similar to our observations for the 
human MESP2-knockout reporter cell lines (Fig. 3b). When assessed 
with the 3D synchronization assay, the SCDP1 patient cell line exhibited 
sustained collective oscillation and occasional travelling waves, indi- 
cating an intact synchronization mechanism (Extended Data Fig. 10b, 
c), similar to the results seen for the human MESP2-knockout iPS cell 
lines (Fig. 3c, d). 


Altered gene expression in patient-derived cells 


To facilitate the molecular and functional analysis of the SCDP1 patient 
cellline, we generated isogenic controls by correcting the underlying 
predicted pathogenic mutations by gene targeting with CRISPR-Cas9. 
Allele-specific gene correction of MESP2 was achieved using single 
guide RNAs (sgRNAs) targeting either the c.258_261delCAGC or the 
c.307G>T mutation and homologous recombination with donor vec- 
tors bearing the wild-type MESP2 sequence. Microhomology-assisted 
excision (MhAX) was used to remove the selection cassette” (Extended 
Data Fig. 10e-i), thereby effectively rescuing the disease-causing loss of 
MESP2, albeit heterozygously. Gene-edited iPS cells were confirmed to 
be karyotypically similar to the parental patient iPS cell line (Extended 
Data Fig. 10j-n). As no clear oscillation or synchronization phenotype 
could be observed for the analysed patient cell line, we searched for 
possible differences at the functional or molecular level by comparing 
patient (SCDP1-A and SCDP1-F) and corresponding rescued iPS cell lines 
(SCDP1-resA and SCDP1-resF). To this end, we induced and compared 
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the different stages of our in vitro induction and differentiation proto- 
col using RNA-seq analysis of the patient and rescue cell lines (Fig. 4d, 
Extended Data Fig. 11a). 

Comparison of clones from patients with those from healthy con- 
trol and heterozygously corrected cell lines revealed the presence 
of an upregulated gene cluster at the somitic mesoderm stage in the 
analysed patient cell lines, which could be reversed by correction of 
either mutated allele (Fig. 4d). This observed pattern of upregulated 
genes was also shared with somitic mesoderm samples of two different 
MESP2-knockout cell lines subjected to the same type of induction and 
analysis. Genes apparently upregulated in patient somitic mesoderm 
and reduced upon rescue of either MESP2 mutation included FGF4, 
FGF18 and DUSPS (Fig. 4d, e), suggesting that abnormal FGF signal- 
ling could be a disease-associated molecular feature in SDV. Somitic 
mesoderm samples derived from MESP2-knockout iPS cells also showed 
higher levels of expression of FGF4, FGF18 and DUSPS (Fig. 4d, e). It 
should be noted that mutations in FGF-pathway components have 
not been reported in SDV, despite extensive investigation of cases by 
whole-exome and genome sequencing. Knockout mouse lines for these 
genes also do not show defects in somite segmentation, but usually 
result in embryonic lethality before somite formation. Of note, EPHA3, 
which was previously reported to have a dominant-negative effect on 
somite patterning and axial organization in fish®, was also upregulated 
in SCDP1 patient- and MESP2-knockout-derived somitic mesoderm 
(Fig. 4d, e). Further, knockout and patient cell lines showed higher 
levels of expression of MESP1 and MESP2 compared with healthy or 
genetically corrected control samples, indicating possible disrupted 
negative-feedback regulation by MESP2. Several other genes associated 
with patterning during somitogenesis, for which genetic mutations in 


patients with SCD were recently reported, including LFNG”°, RIPPLY2** 
and DMRT2*, were also upregulated in somitic mesoderm derived 
from SCDPI1 patient cell lines (SCDP1-A and SCDPI1-F) harbouring 
MESP2\oss-of-function mutations (Fig. 4d, e, Extended Data Fig. 11b), 
indicating reciprocal regulatory mechanisms that possibly connect 
these disease-associated genes at the molecular and functional level 
during the pathogenesis of SDV. Whether dysregulation of the previ- 
ously mentioned patterning-associated genes in somitic mesoderm 
is indeed the causative factor leading to the development of SDV in 
patients with MESP2 loss-of-function mutations remains to be eluci- 
dated and is the topic of ongoing research efforts. This will probably 
require the establishment of additional assay systems in which actual 
human somitogenesis—including the formation and patterning of 3D 
epithelial somites—can be achieved and assessed in vitro. 


Synchronization defect in patient PSM 


In addition to SCDP1, we also searched for a disease-causing mutation 
in SCDP2 (Supplementary Note 1a, b), and identified a homozygous 
variant in DLL3 (rs786200899: c.603_604insGCGGT, p.P202Afs*41) 
(Extended Data Fig. 12a, Supplementary Note 2). DLL3is the most clini- 
cally relevant and frequently mutated gene in SCD”. We performed gene 
correction and obtained several isogenic rescue lines in which the DLL3 
mutation was homozygously corrected (Extended Data Fig. 12b-e). 
Assessment of SCDP2-derived PSM with the 2D oscillation assay 
revealed sustained oscillation (Fig. 4c), whereas the 3D synchroniza- 
tion assay of SCDP2 lines showed rapid damping of oscillation (Fig. 4f, g, 
Extended Data Fig. 12f), as also previously shown for DLL3-knockout 
lines (Fig. 3c, d), indicating a defect in the synchronization ability of 
these patient-derived PSM cells. The synchronization phenotype was 
rescued upon isogenic correction of the DLL3 mutation, with strong 
sustained oscillation and occasional travelling waves in the cases of PSM 
derived from the isogenic rescue cell lines (SCDP2-resE17 and SCDP2- 
resE43) (Fig. 4f, g, Extended Data Fig. 12f, Supplementary Video 5). Our 
approach is thus capable of recapitulating a human disease-causing 
phenotype associated with the loss of DLL3 leading to defective syn- 
chronization at the PSM stage. How exactly the loss of synchronization 
is manifested in the abnormal patterning and formation of the develop- 
ing human axial skeleton remains to be determined. 

In summary, we have shown phase and antiphase oscillation and 
travelling-wave-like expression of key segmentation-clock genes in 
human in vitro-derived PSM, and have identified a putative molecular 
network of known and novel genes comprising the human and mouse 
segmentation clocks, with around five-hour and two- to three-hour 
periods in PSC-derived human and mouse PSM, respectively. We 
assessed the function of several disease-linked genes associated with 
the human segmentation clock, by applying our experimental model 
system in combination with patient-like and patient-derived iPS cells, 
thus effectively creating a human pluripotent stem cell-based model 
for SDV, which will further contribute to deciphering the molecular 
mechanisms underlying normal and pathological human axial skel- 
etogenesis. Having access to a robust experimental model system that 
can be easily manipulated without the need for transgenic animals or 
primary tissues, while enabling assessment of genetic, environmen- 
tal or epigenetic factors, will facilitate our molecular and functional 
understanding of the role the segmentation clock in development 
and disease. 
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Methods 


No statistical methods were used to predetermine sample size. The 
investigators were not blinded to allocation during experiments and 
outcome assessment. 


Pluripotent stem cell generation and culture 

Experiments were performed using mainly two human iPS cell lines 
derived from healthy donors, 1231A3” and 201B7*. Pluripotent stem 
cells of patients with SCD and presenting SDV were induced using 
patient-derived primary cell samples. Primary cell sample of the first 
patient with SDV was obtained from the NIGMS Human Genetic Cell 
Repository at the Coriell Institute for Medical Research (GM13539) 
and used for the derivation of iPS cell line SCDP1 (Extended Data Fig. 9, 
Supplementary Note 1). Primary tissue samples of a second patient with 
clinical features of SDV, including vertebral segmentation defects along 
the entire spine (C1 to sacrum) and bilateral fusion of ribs posteriorly, 
were obtained in Japan and used for derivation of iPS cell line SCDP2 
(see also Extended Data Fig. 9g, Supplementary Note 1). All experi- 
ments followed relevant guidelines and regulations and were approved 
by ethics committees of the Kyoto University Graduate School and 
Faculty of Medicine, Kyoto, Japan and Meijo Hospital, Nagoya, Japan. 
Experimental use of patient-derived cells was also approved by eth- 
ics committees of the RIKEN BDR (Kobe) and RIKEN IMS (Yokohama). 
Informed consent was obtained from legal guardians of patients by 
relevant institutions. Reprogramming was performed with episomes 
(pCE-hOCT3/4, pCE-hSK, pCE-hUL, pCE-mp53DD, pCXB-EBNAL) under 
feeder-free conditions using StemFit medium and laminin-coated 
dishes (iMatrix511)*”. Human iPS cells were maintained without feeder 
cells and cultured oniMatrix-511 silk (Nippi) coated dishes or plates with 
StemFit AKO2N (Ajinomoto) medium supplemented with 50 U peni- 
cillinand 50 mg mI streptomycin (Gibco). Utilized iPS cell lines were 
regularly tested and reported negative for mycoplasma contamination. 


Stepwise induction of human somitic mesoderm 

Human iPS cells were seeded on iMatrix-511 silk-coated plates or 
dishes at appropriate densities as single cells (for example, 1.3 x 10* 
cells per well into 6-well plates; 8.0 x 10* cells per dish into 10-cm dishes) 
4-5 days before induction. All differentiation and induction steps were 
performed in chemically defined medium with insulin (CDMi)”, unless 
otherwise mentioned. Our stepwise protocol is similar to arecently pub- 
lished mesoderm induction protocol, with some differences. Human 
primitive streak cells were induced by treatment of iPS cells with basic 
FGF (bFGF, 20 ng mI), CHIR99021 (10 pM) andactivin A (50 ng mI’) for 
24 h. PSM cells were induced from primitive streak cells by exposure 
to SB431542 (10 uM), CHIR99021 (3 uM), LDN193189 (250 nM) and 
bFGF (20 ng mI‘) for 24 h. Subsequently, somitic mesoderm cells were 
induced from PSM cells using PD173074 (100 nM) and XAV939 (1 1M) 
for 24h. For details of the used recombinant human proteins and small 
molecule agonists or inhibitors, see Supplementary Table 6. Further 
details of applied in vitro induction and differentiation protocols are 
shared in the open protocol repository Protocol Exchange*®. 


Human sclerotome and dermomyotome induction 

Following initial stepwise somitic mesoderm induction, human scle- 
rotome cells were induced with combination of smoothened agonist 
(SAG, 100 nM) and LDN193189 (600 nM)" for 72h. Dermomyotome cells 
were induced from human somitic mesodermas previously described‘, 
using a combination of CHIR99021 (3 uM), GDC0449 (150 nM) and 
BMP4 (50 ng mI’) for 48 h. 


In vitro 3D chondrogenic induction 

Stepwise-induced human sclerotome cells were dissociated using 
Accutase (Life Tech), centrifuged and resuspended in CDMi before 
being seeded (2.0 x 10° cells per well) into 96-well low attachment plates 


containing sclerotome induction medium with ROCK inhibitor Y27632 
(Wako), forming 3D aggregates overnight. Initial 3D sclerotome spheres 
were transferred into low-attachment plates or dishes containing 3D 
chondrogenic induction medium” and cultured under standard condi- 
tions. Medium was changed every three days. 


In vitro skeletal muscle induction 

Dermomyotome cells were dissociated using Accutase (Life Tech), cen- 
trifuged, resuspended in CDMiand seeded (2.5 x 10° cells per well) into 
Matrigel-coated 12-well plates in muscle induction medium containing 
ROCK inhibitor Y27632 (Wako). To induce human skeletal muscle cells, 
we applied the N2 medium established previously with some modi- 
fications (DMEM/F12 (Gibco), 1% insulin-transferrin—selenium (Corn- 
ing), 1% N2 Supplement (Gibco), 0.2% penicillin/streptomycin (Gibco), 
1% L-glutamine (Gibco), 2% horse serum (Sigma-Aldrich)). Medium 
was changed every three days. Calcium imaging of dermomyotome- 
derived skeletal muscle activity in GCaMP-reporter line (Gen1C)* was 
performed using Nikon A1R MP (Multiphoton + N-STORM). 


In vivo xeno-transplantation of PSM derivatives 

Male NOD/ShiJic-scidJcl mice were purchased from CLEA Japan and used 
at six weeks of age. Human sclerotome cells derived from healthy-donor 
(wild type) or homozygous and heterozygous luciferase reporter lines 
(625-A4 and 625-D4) were dissociated using Accutase (Life Tech) and 
resuspended in 100 pl of CDMi before being mixed with the same vol- 
ume of Matrigel as previously described*. Numbers of transplanted cells 
ranged from ~5.0 x 10° to 1.2 x 10° cells per injection. Cells were injected 
into mice subcutaneously with a 26 G needle and 1-ml syringe (Terumo). 
Forming cartilage and bone tissues were taken out at 2 months after 
injection. Bioluminescence images were taken with IVIS Spectrum 
(PerkinElmer). Whole-mount images were taken with LEICA M205FA 
(Leica). Animal experiments were approved by the institutional animal 
committee of Kyoto University and performed in strict accordance 
with the Regulation on Animal Experimentation at Kyoto University. 


Quantitative PCR with reverse transcription 

RNA was extracted with the RNeasy mini kit (Qiagen) following the 
manufacturer’s instructions. cDNA was synthesized using Superscript 
Ill Reverse Transcriptase (Invitrogen) from 1 pg total RNA. cDNA was 
diluted 1:10 in RNase-free water. RT-qPCR was performed using Thun- 
derbird SYBR qPCR Mix (Toyobo) and QuantStudio 12K Flex Real-Time 
PCR System (Thermo Fisher). The expression values of target genes 
were normalized by b-actin expression from the same cDNA templates. 
For oscillation analyses, fold induction relative to time O was calcu- 
lated as 24“ where AAC, values were differences between AC, values 
at time O and each time point (technical triplicates). For other analyses, 
expression values of each biological replicate were calculated from 
technical triplicate or quadruplet qPCR reactions, and the mean and 
s.d. values were determined from the expression values of biological 
replicates. Details of used RT-qPCR primers are listed in Supplementary 
Tables 7.Land 7.2. 


Immunocytochemistry 

Cells were fixed with 2% paraformaldehyde (PFA) for 30 min and washed 
twice with PBS. Samples were permeabilized with 0.2% Triton X-100 
(Sigma-Aldrich) in PBS for 10 min at room temperature and then washed 
with PBST (1% Tween 20 (Sigma-Aldrich) in PBS). Subsequently, samples 
were blocked in 5% skim milk for 1h at room temperature and then stained 
with primary antibodies for overnight at 4 °C. Samples were then washed 
with PBST three times and stained with secondary antibodies for 1h at 
room temperature. Antibodies were diluted in 10% blocking solution 
(5% skim milk) in PBST, washed with PBST twice and stained with DAPI 
for nuclear counterstaining for 5 min at room temperature. All images 
were taken using Nikon AIR MP (Multiphoton+N-STORM). For details 
of used primary and secondary antibodies see Supplementary Table 8. 


Histological analysis 

Tissues were fixed with 4% PFA overnight at 4 °C. Fixed samples were 
washed with PBS twice and embedded in paraffin. Sections were sliced 
at 3 um for immunostaining and 5 um for other types of staining. Sec- 
tions were stained with haematoxylin and eosin (H&E), safranin O, von 
Kossa, pentachrome, typeI collagen (COL1) antibody, type II collagen 
(COL2) antibody and human nuclear antigen (HNA) antibody. Sections 
stained with antibodies were incubated for overnight at 4 °C. Secondary 
antibodies were applied with N-Histofine Simple Stain MAX PO (Nichirei 
Bioscience) for 30 min at room temperature. Signals were detected by 
N-Histofine DAB-3S kit (Nichirei Bioscience). For details of antibodies 
used see Supplementary Tables 8.1 and 8.2. 


Flow cytometric analysis 

Cells were washed with PBS and dissociated using Accutase (Life 
Technologies) and centrifuged. Cells were resuspended (1.0 x 10’ 
cells per ml) in fluorescence-activated cell sorting (FACS) buffer (0.1% 
BSA in PBS) and stained with allophycocyanin (APC)-conjugated DLL1 
antibody for 30 min at 4 °C. Cells were then stained with DAPI to elimi- 
nate dead cells after washing with FACS buffer once and finally strained 
through a filter mesh. As for the co-staining of intracellular molecules 
TBX6 and brachyury (encoded by 7) with DLLI, cells were fixed with 4% 
paraformaldehyde for 20 min at 4 °C after initial staining with DLL1 
antibody and washed twice with staining medium, which contained 
PBS with 2% FBS. Samples were permeabilized with BD Perm/Wash 
buffer (BD Biosciences) for 15 min at room temperature and stained with 
TBX6 primary antibody or phycoerythrin (PE)-conjugated brachyury 
antibody for 60 min at room temperature and washed with BD Perm/ 
Wash buffer twice. The cells stained with TBX6 antibody were stained 
with Alexa Fluor 488-conjugated secondary antibody for 60 min at 
room temperature. The samples were washed with BD Perm/Wash 
buffer twice and suspended into staining medium. Flow cytometric 
analysis was performed using LSR or BD FACSAria II cell sorter (BD 
Biosciences). FACS data were analysed and graphs were generated 
using FlowJo software (FlowJo). For details of used antibodies see Sup- 
plementary Tables 8.3 and 8.4. 


Reporter constructs 

For the human HES7 reporter, human HES7 promoter (5,937 bp) and 
3’ untranslated region (UTR) were fused to Luciferase2-NLS-d1PEST®. 
For the mouse Hes7 reporter, mouse Hes7 promoter and 3’ UTR were 
fused to Ub-Luciferase2-NLS*. For the dual-reporter assay, the HES7 
promoter and 3’ UTR were fused to NanoLuc-NLS-d1PEST, while human 
DKK1 promoter (2,218 bp) and 3’UTR were fused to Luciferase2-NLS- 
dIPEST. These reporters were integrated into the genome using pig- 
gyBac transposition. See Extended Data Fig. 2a and Extended Data 
Fig. 7d for schematic overviews of used reporter constructs. 


Oscillation assay in2D 

Primitive streak and PSM were induced in a stepwise manner as 
described above. Luminescence was measured in the presence of 
D-luciferin (200 uM) with Kronos Dio Luminometer (Atto) from the 
timing of PSM induction. The obtained signal was detrended with Excel 
(Microsoft), and converted to the instantaneous phase with the Hilbert 
and Angel functions of Matlab (Mathworks). For the dual-reporter assay, 
HES7 and DKK1 reporter constructs were simultaneously introduced 
into the cells, and each luminescence was filtered and measured inthe 
presence of Endurazine (Live Cell Ex-4377, Promega) (400 nM) and 
D-luciferin (1 mM). The dual-reporter cells were seeded on a 35-mm 
dish coated with iMatrix-511 at 3,000 cells per dish. After 4 days cul- 
ture, medium was changed into CDMi containing SB431542 (10 pM), 
CHIR99021 (10 1M), DMH1 (2 pM) and bFGF (20 ng mI’). After addi- 
tional three days culture, the medium was changed into CDMi without 
inhibitors for measurement with Kronos Dio Luminometer (Atto). This 


modified one-step protocol” was used only for Extended Data Fig. 7d. 


All other 2D oscillation measurements of human PSC-derived PSM 
were performed using our standard stepwise PSM induction protocol. 


Synchronization assay in3D 

To make 3D induced PSM spheroids, HES7 reporter iPS cells were 
seeded into non-adhesive round bottom 96-well plates at 1,000-5,000 
cells per well and cultured in CDMi containing BMP4 (50 ng mI), 
CHIR99021 (10 uM) and Y27632 (10 pM). After one day of culture, Y27632 
was removed. After 18 h of culture, the medium was changed to CDMi 
containing DMH1 (2 uM) and CHIR99021 (10 uM). After 6 h culture, 
the spheroid was transferred to a fibronectin-coated glass bottom 
dish with CDMi containing DMHI (2 uM) and D-luciferin (1 mM), and 
luminescence of the spreading spheroid was imaged with a customized 
incubator microscope LCV110 (Olympus). The signal was averaged over 
allarea or region of interest (ROI). When needed, the signal was further 
detrended with Excel, and converted to the instantaneous phase with 
Matlab. When the signal was weak, the spike noise was removed initially 
with Image] as described previously*®. Kkymographs were made by aver- 
aging signals over ten pixels with Metamorph (Molecular Devices). 


Sampling for RNA-seq analysis of oscillating human genes 

Our standard stepwise PSM-induction protocol was used with the fol- 
lowing modifications. HES7 reporter cells were seeded on a35-mm dish 
coated with Matrigel. At 12 h during the second step (PSM induction), 
the cells were split into multiple 35-mm dishes at 4.0 x 10° cells per dish 
and cultured in CDMi containing $B431542 (10 uM), LDN193189 
(250 nM) and CHIR99021 (3 pM). After 12 h culture the medium was 
changed into CDMi containing SB431542 (10 1M), LDN193189 (250 nM), 
CHIR99021 (3 uM) and bFGF (20 ng mI). The luminescence was con- 
tinuously monitored with Kronos Dio Luminometer using one sample, 
and the other 16 samples were frozen at each time point. 


Induction and sampling of murine EpiSC-derived PSM 

Mouse EpiSCs were obtained from RIKEN BRC (AES0204)* and main- 
tained on fibronectin-coated dishes with DMEM/Ham F-12 containing 
15% knockout serum replacement, nonessential amino acids (0.1mM), 
6-mercaptoethanol (0.1 mM), activin A (20 ng mI”), bFGF (10 ng mI) 
and IWR-1-endo (2.5 uM). The mouse EpiSC line was tested and reported 
negative for mycoplasma contamination. For murine PSM induction, 
EpiSCs were seeded on 35-mm dishes coated with fibronectin and cul- 
tured overnight with the medium containing Y27632 (10 uM), activinA 
(20 ng mI) and bFGF (10 ng mI") but without IWR-1-endo. The medium 
was then changed to CDMi containing SB431542 (10 uM), LDN193189 
(250 nM), CHIR99021 (10 iM) and bFGF (20 ng mI’). After 30 h, the 
medium was changed again to CDMi containing SB431542 (10 uM), 
LDN193189 (250 nM), CHIR99021 (3 pM) and bFGF (20 ng ml). The 
luminescence was continuously monitored with Kronos Dio Lumi- 
nometer using one sample, and the other 16 samples were frozen at 
each time point. 


Library preparation for RNA-seq analysis 

Total RNA was extracted using RNeasy mini kit (Qiagen) following the 
manufacturer’s instructions. RNA-seq libraries were generated from 
200-300 ng total RNA with the TruSeq Stranded mRNA LT Sample 
Prep Kit (Illumina) according to the manufacturer's protocol, withthe 
exception of the libraries used for RNA-seq analysis of human oscil- 
lating genes, which were generated from 120 ng total RNA using Neo- 
Prep system (Illumina) following the manufacturer’s instructions. The 
obtained RNA-seq libraries were sequenced on NextSeq 500 (75-86 bp 
single-end reads, Illumina). 


RNA-seq data analyses 
The sequenced reads were mapped to the hg38 human reference 
genome plus the luciferase reporter sequence using HISAT2 (v.2.1.0)° 
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with the GENCODE v.25 annotation gtf file after trimming adaptor 
sequences and low-quality bases by cutadapt-1.14°'. The mapped reads 
with high mapping quality (MAPQ = 20) were used for further analyses. 
For identification of the differentiation stage-related genes, the differ- 
entially expressed genes (> fivefold changes and q values < 0.05 between 
any pair of samples) were extracted using Cuffdiff? within Cufflinks 
v.2.2.1 package and GENCODE v.25 annotation file, and the extracted 
genes were grouped into six stages based on the maximum expression 
levels (FPKM values determined by Cuffdiff) among the differentiation 
stages. The low expressed genes (< 10 FPKM) across all stages were fil- 
tered out before grouping. For identification of the oscillation genes, 
the uniquely mapped reads were counted and normalized to calculate 
the gene expression levels using HTSeq (v.0.6.1)°? with GENCODE v.25 
annotation gtf file (protein-coding genes) and edgeR (v.3.18.1)™ after 
filtering low-expressed genes (cpm <1) across all conditions in each 
experiment. Rhythmic genes were identified by ARSER (v.2.0)® with 
FDR_BH < 0.03 in both of two independent experiments for the human 
case and with FDR_BH <0.2in either of two independent experiments for 
the mouse case. The filtering genes for noise judged by ARSER in both 
experiments were excluded from human oscillation genes. For pathway 
and Gene Ontology analyses, DAVID web tools® and IPA (Qiagen, https:// 
www.qiagenbioinformatics.com/products/ingenuitypathway-analysis) 
were used with a gene set including both phase and antiphase oscillating 
genes. For identification of the patient (SCDP1-A and SCDPI-F) related 
genes, fold changes with q values were calculated with HTSeq (v.0.6.1)" 
with GENCODE v.25 annotation gtf file and edgeR (v.3.18.1)**. The nor- 
malization counts (pseudocounts) determined by edgeR were used as 
expression values. The genes whose expression values were upregulated 
or downregulated (> threefold changes, g< 0.05, SCDP1 lines versus wild 
type), and increases or decreases were inhibited (> 50%, q< 0.05) inthe 
rescued lines, were defined as SCDP1-related upregulated or downregu- 
lated genes, respectively. The genes whose expression levels were low 
(average cpm <5) in both wild-type and SCDP1-lines were filtered out. 
For comparisons of expression profiles between knockout cell lines 
and their parental cell lines (wild type), FPKM values, fold changes and 
qvalues were calculated using Cuffdiff? within Cufflinks v.2.2.1 package 
and GENCODE v.25 annotation file (protein-coding genes). The PCA 
and representation of heat maps and scatter plots were performed 
using R software. 


CRISPR-Cas9 gene knockout 

Gene knockout was performed using transient transfection of 
pSpCas9(BB)-2A-Puro (PX459) V2.0 (a gift from F. Zhang, Addgene 
plasmid #62988). Oligonucleotides encoding sgRNA protospacer 
sequences (Supplementary Table 7) were annealed and cloned as 
described previously”. sgRNAs were verified by sequencing. Plasmid 
DNA (1 1g) was transfected into iPS cells by electroporation followed 
by selection with 0.5 pg mI puromycin for 48 h. Surviving cells were 
allowed to recover and then replated at low density before picking 
isolated colonies. For overview of knockout reporter line establish- 
ment and details of sgRNAs used see Extended Data Fig. 8a and Sup- 
plementary Table 7.5. 

Generation of HES7”” mutant lines 
Target-specific gRNA sequences were cloned into pSpCas9(BB)-2A- 
Puro (PX459) V2.0. One-million human HES7 reporter iPS cells were 
electroporated with 3 pg each of targeting vector and ssODN, treated 
with 0.5 pg mI puromycin 24 h after transfection for 48 h, and single 
clones were Sanger sequenced. Candidate clones were allele resolved 
using Zero Blunt TOPO PCR Cloning Kit (Invitrogen) anda clone identi- 
fied with one correctly modified allele and one allele containing a5-bp 
deletion was re-targeted witha gRNA specific to the 5-bp deletion, and 
single clones were obtained as outlined above. The detailed targeting 
process, used sgRNAs and ssODN templates are shown in Extended 
Data Fig. 9a, Supplementary Tables 7.5 and 7.8. 


Whole-exome sequencing and variant calling 

Whole-exome sequencing was performed as previously described**. In 
brief, DNA (3 pg) was sheared with S2 Focused-ultrasonicator (Covaris) 
and processed by SureSelectXT Human All Exon VS (Agilent Technolo- 
gies). Captured DNA was sequenced using HiSeq 2000 (Illumina) with 
101-bp pair-end reads with seven indices. Image analysis and base call- 
ing were performed using HCS (v.2.0.10), RTA (v.1.17.21.3) and CASAVA 
(v.1.8.2) software (Illumina). Reads were mapped to the reference 
human genome (hg19) by Novoalign v.3.02.04. Aligned reads were 
processed with Picard (v.2.0.1) to remove PCR duplicates. Variants were 
called by GATK v.2.7-4 following GATK Best Practice Workflow v.3°° 
and annotated by ANNOVAR (v.2016Mar30)“. All the variants of the 
candidate genes, which have been reported to cause SCD or congenital 
scoliosis, were evaluated using five databases: gnomAD, Human Gene 
Mutation Database (HGMD), SIFT, PolyPhen-2 and MutationTaster. 


Quality control of established iPS cells 

Morphological images of iPS cell colonies were captured using an 
Olympus CKX41 microscope with a PlanApo 10x 0.75 NA objective lens 
(Olympus) and Nikon digital camera DS-Fil. Chromosomal G-banding 
analyses were performed by Chromocentre. Genomic DNA and total 
RNA were extracted with AllPrep DNA or RNA mini kit (Qiagen) fol- 
lowing the manufacturer’s instructions. Genomic DNA was diluted to 
25ng mI ‘in distilled water. cDNA was synthesized using PrimeScript RT 
Master Mix (Takara) from 500 ng total RNA and diluted 1:10 in RNase- 
free water for expression analysis of OCT3/4 (also known as POUSF1) 
and NANOG mRNA, and 1 pg total RNA for TaqMan hPSC Scorecard 
analysis. OCT3/4 and NANOG mRNA expression were confirmed by 
RT-qPCR with TaqMan assay using StepOnePlus Real-Time PCR Sys- 
tems (Thermo Fisher). Primers and probe sequences are provided in 
Supplementary Table 7.3. The expression values of target genes were 
normalized by GAPDH expression from the same cDNA templates and 
calculated relative to 201B7 iPS cell line. Residual plasmids used for iPS 
cell establishment were analysed by TaqMan quantitative PCR using 
StepOnePlus Real-Time PCR Systems (Thermo Fisher). Primer and probe 
sequences (cmCAG: common-CAG) are designed on CAG-promoter 
regionincluded in all of the episomal vectors for iPS cell generation and 
listed in Supplementary Table 7.3. The residual plasmid numbers were 
determined by a standard curve method with pCE-OCT3/4 episomal 
plasmid of known quantity using 50 ng genomic DNA of SCDP1 and 
SCDP2 iPS cells at passage 6. 


Initial validation of established iPS cells 

Established (patient) iPS cells together with control human PSCs were 
differentiated into ectoderm, mesoderm and endoderm lineages using 
STEMdiff Trilineage Differentiation Kit (Stemcell Technologies). Human 
PSCs reaching 70-80% confluence were collected with TrypLE Select 
Enzyme (1x) (Thermo Fisher) and plated as a single cell suspension in 
mTeSR1 medium (Stemcell Technologies) containing 10 pM Y27632 
(Wako) on six-well plates coated with Matrigel (BD Biosciences). The 
cells were plated at 4.0 x 10° cells, 2.0 x 10° cells and 4.0 x 10° cells per 
well for ectoderm, mesoderm and endoderm differentiation culture 
respectively and differentiated following the manufacturer’s instruc- 
tions. For FACS-based evaluation of undifferentiated PSCs and each 
of the three germ layers (1.0 x 10° cells each) were fixed with 4% para- 
formaldehyde phosphate buffer solution (4% PFA/PBS) for 20 min at 
4 °C and washed twice with staining medium, which contained PBS 
with 2% fetal bovine serum (FBS). Samples were permeabilized with 
BD Perm/Wash buffer (BD Biosciences) for 15 min at room temperature 
and stained with fluorescence-conjugated antigen-specific and isotype 
antibodies listed in Supplementary Tables 8.3 and 8.4. The samples 
were washed with BD Perm/Wash buffer twice and suspended into 
staining medium. Flow cytometric analysis was performed using LSR 
(BD Biosciences). FACS data was analysed and graphs were generated 


using FlowJo software (FlowJo). For transcript level assessment of differ- 
entiation capacity, qPCR was performed with a 384-well TaqMan hPSC 
Scorecard panel (Thermo Fisher) by QuantStudio 12K Flex Real-Time 
PCR System (Thermo Fisher) using undifferentiated PSC and each of 
the three germ layers cDNA samples. Pluripotency and differentia- 
tion property into ectoderm, mesoderm and endoderm lineages were 
scored by hPSC Scorecard Analysis software, which is available onthe 
Thermo Fisher website (https://www.thermofisher.com/jp/en/home/ 
life-science/stem-cell-research/taqman-hpsc-scorecard-panel.html)”. 


Gene correction of patient-derived iPS cells 

Correction of mutations in patient-derived iPS cells was performed 
using the MhAX method as previously described”. In brief, donor plas- 
mids for correction of each mutant allele were created by PCR amplifi- 
cation of the homology arms from cloned haplotype-specific genomic 
DNAusing the primers listed in Supplementary Table 7. For correction 
of MESP2 mutations, the right homology arm was amplified from SCDP1 
patient DNA corresponding to the matching mutant allele, and the left 
arm was amplified from normal 201B7 iPS cell DNA, which bearsa similar 
haplotype. InFusion cloning (Clontech) was used to assemble the arms 
witha restriction-digested CAG::mCherry-/RES-puro selection cassette 
(Addgene plasmid 113876) and CAG::GFP plasmid backbone (Addgene 
plasmid 107281). PCR-amplified regions and InFusion junctions were 
verified by sequencing. Oligonucleotides encoding sgRNA proto- 
spacer sequences targeting MESP2 or DLL3 (Supplementary Table 7) 
were annealed and cloned into pX330-U6-Chimeric_BB-CBh-hSpCas9 
(a gift from F. Zhang, Addgene plasmid 42230) as described previously”. 
sgRNAs were verified by sequencing. For gene targeting, allele-matched 
donor plasmids (3 pg) and Cas9 and sgRNA expression plasmids (1 1g) 
were co-transfected by electroporation into 1 x 10° SCDP1 or SCDP2 
patient iPS cells, which were then divided and plated under feeder-free 
conditions for 48 hin AKO2N medium (Ajinomoto) containing 10 uM 
ROCK inhibitor Y-27632 (Wako) before initiating antibiotic selection 
(0.5 pg mI puromycin, Sigma-Aldrich). Nine days after plating, puromy- 
cin-resistant cells were pooled and passaged. For SCDP1, GFP mCherry* 
colonies were isolated, cultured, stored and processed for genomic 
DNA isolation under feeder-free conditions in 96-well format. iPS cell 
clones positive for PCR genotyping and sequencing were defrosted and 
expanded for genomic DNA extraction and Southern blot verification. 
For DLL3, GFP mCherry’ cells were sorted and cultured as populations 
for subsequent cassette excision. For cassette excision from clones or 
populations, 3 pg of the pX-eGFP-g1 expression plasmid (Addgene plas- 
mid 107273) was transfected into 1 x 10° gene-targeted patient iPS cells, 
which were then divided and plated under feeder-free conditions for 
48 hin AKO2N medium containing 10 tM Y-27632, followed by growth 
without selection for a total of 6 days. mCherry cells were isolated by 
FACS onaBD FACSAria II cell sorter, and plated at low density for clonal 
isolation after 8 days. Isolated clones were cultured, stored in 96-well 
format, then genotyped for cassette excision by PCR and sequencing 
before final verification by Southern blot. 


Genomic DNA extraction 

Genomic DNA for PCRamplification and sequencing was isolated from 
0.5-1.0 x 10° iPS cells using a DNeasy Blood and Tissue Kit (Qiagen). 
Genomic DNA for Southern blotting was isolated from a single con- 
fluent well of a 6-well dish using lysis buffer (100 mM Tris-HCl, pH 8.5, 
5 mM EDTA, 0.2% SDS, 200 mM NaCl, and 1 mg mI" proteinase K) fol- 
lowed by phenol-chloroform extraction and ethanol precipitation 
from the aqueous phase. Genomic DNA was eluted from columns or 
resuspended from precipitate in TE pH 8.0. 


Southern blot analysis 

For MESP2and DLL3 gene correction, patient and rescued iPS cells were 
analysed by Southern blotting as described previously”. Probe regions 
were PCR amplified with Ex Taq (Takara) directly from genomic DNA or 


cloned plasmid templates to incorporate DIG-labelled dUTP (Roche) 
using the primers described in Supplementary Table 7. Genomic DNA 
(5-10 pg) was digested with EcoRI (MESP2), HindIII (DLL3) or Sacll 
(DLL3). Sphi, anon-cutter at the DLL3 locus, was included in Sacll diges- 
tions to reduce DNA viscosity. 


iPS cell genotyping and SNP array 

PCR primers flanking annotated coding exons of DLL3 (Accession 
NG _008256.1), HES7 (Accession NG_015816.1), LFNG (Accession 
NG_008109.2) and MESP2 (Accession NG_008608.1) were designed 
using NCBI Primer-BLAST with optional settings filtering human 
repeats and SNPs, with primer pair specificity checking to Homo sapiens 
(taxid:9606). PCR primers for genotyping gene-edited cell lines were 
designed using similar principles. All genotyping primers are listed in 
Supplementary Table 7.Genomic PCR was carried out using KAPA HiFi 
HotStart (KAPA Biosystems) ona Veriti 96-well Thermal Cycler (Applied 
Biosystems) according to the manufacturer’s instructions. Specific 
PCR conditions are available upon request. PCR products were treated 
with ExoSAP-IT Express (Affymetrix) and sequenced with the primers 
indicated in Supplementary Table 7 using BigDye Terminator v.3.1 Cycle 
Sequencing Kit (Applied Biosystems) on a 3130xl Genetic Analyzer 
(Applied Biosystems). Sequence analysis was performed using variant 
calling in Sequencher (Genecodes) or alignment in Snapgene (GSL Bio- 
tech). Genomic DNA from patient iPS cells and iPS cell clones rescued 
by gene editing were genotyped using an Infinium OmniExpress-24 v.1.2 
(Illumina) SNP array according to the manufacturer’s recommenda- 
tions. Data collection was performed on aniScan Bead Array Scanner 
(Illumina). Data was compared to the reference human genome (hg19) 
using acombination of PennCNV, cnvPartition, GWAS tools, and MAD. 
Karyograms were prepared inR (v.3.2.5) using GWASTools (v.1.16.1)°. 


Reporting summary 


Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


All RNA sequencing data used for this study have been deposited in 
the NCBI Gene Expression Omnibus (GEO) under accession number 
GSE116935. SNP array data inthe current publication have been depos- 
ited in and are available upon application from the dbGaP database 
under accession number phs001975.v1.p1 and their use is limited to 
health, medical and biomedical purposes. Source Data for Figs. 1-4 
and Extended Data Figs. 1, 2, 5-12 are available in the online version 
of the paper. 


Code availability 


Computational codes and scripts used in this study are available at 
GitHub (https://github.com/mebisuya/SegmentationClock) and upon 
request from the corresponding authors. 
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Extended Data Fig. 1| Characterization of stepwise-induced human PSM. 
a, Heat map of gene expression levels in stepwise-induced human PSM and its 
derivatives (using iPS cell line 1231A3). FPKM values of each gene were 
normalized to the mean of all samples. The gene order is the sameas in Fig. 1b. 
b, PCA plot of transcript expression levels in human PSM and derivatives of 
three independent experiments (1231A3), n=3. Proposed RNA-seq-based 
developmental trajectory is shown in pink. c, RT-qPCR-based validation of 
RNA-seq results; data of four independent experiments with three technical 
replicates each using 201B7 are shown. Data are mean +s.d.,n=4. Similar 
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results were obtained for 1231A3 (data not shown). Open circles insome 
conditions indicate that there are less than four experiments because no C, 
values for these samples were obtained after 45 cycles of PCRto calculate 
expression values. d, Representative flow cytometric evaluation of DLL1and 
TBX6 (left) and DLL1and brachyury (BRA) (right) expression at PSM stage 
(1231A3),n=3.e, Representative expression of DLL1at transcript level during 
in vitro differentiation (201B7). Dataare meants.d.,n=4.f, Representative 
expression of DLL1at protein level, n=3. Correlation of FACS data with 
RT-qPCR results (201B7) shownine. 
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Extended Data Fig. 2 | Characterization of human segmentation clock depiction of reporter construct is shown on top. b, Human segmentation clock 
period inin vitro PSM. a, HES7 reporter activity ina 2D culture (the oscillation period quantification based on detrended and instantaneous phase signals. 
assay condition) and 3D spreading spheroid (the synchronization assay The period was calculated as the average peak-to-peak interval using the Ist to 
condition). Raw, detrended (+100 min window) and phase signals are shown. 5th peaks. The measure of centre is mean, n=3.c, Instantaneous phase-based 


For spheroids, the signal was averaged over all areaor ROIsindicatedbythered | kymographoftravelling-wave-like HES7 reporter activity in spheroid spreading 
line. 2D culture data are same as Fig. 1g and part of 3D-spheroidculturedataare | assayshownin Fig. 1h. Representative data of three independent experiments 
same as Fig. 1h. Data of three independent experiments are shown. Schematic are shown. 
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Extended Data Fig. 3| Characterization of induced human PSM-derivatives, co-staining with TBX6). Scale bar, 100 pm.c, Representative 


somitic mesoderm, sclerotome and dermomyotome. immunofluorescence of dermomyotome markers (PAX7 and PRRX1) and 

a, Representative immunofluorescence staining of PSM markers TBX6 and sclerotome marker (FOXC2) at dermomyotome and sclerotome stages (201B7), 
brachyury (BRA) and somitic mesoderm marker TCF15 at PSM stage, n=3; n=3; entire wells (left) and magnified views of selected areas (right). Staining of 
entire wells (left) and magnified views of selected areas. b, Representative PAX7 (epithelial colonies) at dermomyotome and FOXC2 (mesenchymal 
immunofluorescence staining of PSM markers TBX6 and BRA and somitic colonies) at sclerotome stage. PRRX1 staining surrounding PAX7’ areas is 
mesoderm marker TCF15 at stage, n =3; entire wells (left) and magnified views specific to dermomyotome stage. Scale bar, 100 pm. 


of selected areas. Bottom, staining of segmentation marker MESP2 (alone or 
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Extended Data Fig. 4 | See next page for caption. 


Extended Data Fig. 4 | Functional evaluation of human iPS cell-derived 
sclerotome. a, Assessment of in vivo bone- and cartilage-forming ability of 
human induced sclerotome. Subcutaneous transplantation of PSC-derived 
sclerotome stepwise-induced from healthy control or wild-type (1231A3) and 
luciferase-reporter iPS cell lines (625-D4 and 625-A4). Evaluation of 
transplanted cells using IVIS at two months after transplantation; injection 
sides are marked with dashed or coloured circles. Cartilage and bone-forming 
areas of wild-type iPS cell line (1231A3) marked by white arrows. b, Whole- 
mount images of wild-type sclerotome-derived in vivo cartilage and bone 
tissues isolated from transplanted mice 1 and 3. Explant isolated from mouse 2 
is shownind. Scale bar, 4 mm.c, Representative staining of in vitro human 
sclerotome-derived cartilage (from 3D chondrogenic induction) sections. 


Observed safranin O and type II collagen (COL2) signals are indicative of in vitro 
cartilage formation, n=3.d, Representative whole-mount (top left) and 
histological staining of section (bottom left) of human induced sclerotome- 
derived in vivo cartilage and bone. Scale bar, 100 pm. Representative 
pentachrome staining of marked area reminiscent of in vivo human 
endochondral bone formation; n=3. 1, proliferative human cartilage; II, 
hypertrophic cartilage; III, ossifying cartilage and forming human bone. Scale 
bar, 100 pm.e, Representative sections and staining of areashownind. 
Safranin O and COL2 staining in human in vivo sclerotome-derived cartilage 
areas; von Kossa and COLIstaining in ossifying cartilage and forming bone 
areas. Majority of cells contributing to cartilage or bone formation are HNA- 
positive and of human origin (right bottom); n=3. Scale bar, 100 pm. 
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Extended Data Fig. 5| See next page for caption. 


Extended Data Fig. 5| Functional evaluation of human iPS cell-derived 
dermomyotome. a, Evaluation of in vitro muscle induction from human 
induced dermomyotome. Myosin and sarcomeric a-actinin (SAA) staining of 

in vitro dermomyotome-derived skeletal muscle; representative images of 
entire well (left) and magnified areas (right); n=3. Scale bar, 100 pm. 

b, Comparison of skeletal muscle induction of human iPS cell, and iPS cell- 
derived sclerotome and dermomyotome. Representative myosin heavy chain 
(MYH), myosin and sarcomeric a-actinin staining only apparent in 
dermomyotome-based skeletal muscle differentiation. Right, magnified areas; 


n=3.c, Quantification of contracting colonies and GFP-positive foci of iPS cell-, 
sclerotome- and dermomyotome-derived human skeletal muscle. Calcium- 
reporter iPS cell line (Gen1C) was used inall cases. Measurements of total 18 
view fields in 6 independent experiments. In box-and-whisker plots, the middle 
line represents median value, box edges represent 25th and 75th quartiles and 
error bars show extreme values. d, Representative quantification of calcium 
GFP-reporter activity in iPS cell, sclerotome and dermomyotome as readout of 
spontaneous contraction-mediated GFP signal in induced human skeletal 
muscle cells;n=3. 
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Extended Data Fig. 6 | RNA-seq analysis of human iPS cell-derived oscillating 
PSM.a, Sampling of human oscillating PSM samples for RNA-seq. HES7 
reporter activity was continuously monitored with one sample, and the other 
samples were frozen at each time point indicated in the graph. b, Three- 
dimensional synchronization (spheroid-spreading) assay following inhibition 
of FGF (PD173074, 100 nM), Notch (DAPT, 10 mM), and Wnt (XAV939, 10 mM) 
signalling pathways. The HES7 reporter signal was first averaged over all area, 
the background was subtracted and the signal was normalized to time 0. The 
background was defined as the average signal at time O over the 15 x 15-pixel 
area of the top left corner of the image. Representative graph of three 
independent experiments is shown. See also Supplementary Video 3. 

c, Average HES7 reporter intensity during 36-41 h (2,160-2,440 min) of 
inhibitor treatment. Data are mean +s.d.,n=3; two-sided Dunnett’s test. 


*P<0.05,**P<0.01,***P< 0.001. d, Additional validation of RNA-seq results by 
RT-qPCR for phase and antiphase oscillating genes showing specific 
oscillatory expression in human iPS cell-derived PSM but not in mouse EpiSC- 
derived PSM. Data are shown for two independent biological datasets with 16 
samples each. See also Fig. 2c. e, RT-qPCR validation of phase and antiphase 
oscillating murine genes found to oscillate in mouse EpiSC-derived PSM. Same 
genes show oscillation in human in vitro PSM. f, RT-qPCR validation of phase 
and antiphase oscillating genes identified by RNA-seq in human induced PSM. 
These genes were also validated to show clear oscillation in mouse EpiSC- 
derived PSM. See also Fig. 2d. Ine, f, mean values of three technical replicas of 
two independent experiments (Ex1 and Ex2) for each time point and sample set 
are shown. 
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Extended Data Fig. 7| See next page for caption. 


Extended Data Fig. 7| RNA-seq analysis of mouse EpiSC-derived oscillating 
PSM. a, Heat map of normalized gene expression levels for oscillating genes in 
mouse in vitro-derived PSM. RNA-seq results shown for two independent 
biological datasets with 16 samples each. Examples of identified phase and 
antiphase oscillating genes are highlighted on the right. Oscillating mouse 
genes marked in red and blue match with high- and low-stringency cut-off 
setting identified oscillating human induced-PSM genes, respectively. 
Unambiguously phase- or antiphase oscillating genes are highlighted onthe 
left; solid and dotted black lines indicate unambiguous and ambiguous genes, 
respectively. See Supplementary Table 4 for complete list of identified high- 
stringency cut-off oscillating genes in mouse in vitro-derived PSM. See also 
Fig. 2 and Supplementary Table 2 for RNA-seq results of oscillating human 
segmentation clock genes identified in human iPS cell-derived PSM. 


b, Sampling of mouse oscillating PSM samples for RNA-seq. Hes7 reporter 
activity was continuously monitored with one sample, and the other samples 
were frozen at each time point indicated in the graph. c, RT-qPCR validation of 
identified mouse phase and antiphase oscillating genes. See also Fig. 2d and 
Extended Data Fig. 6e for validation of additional mouse oscillating genes. 
Mean values of three technical replicas of two independent experiments (Ex1 
and Ex2) for each time point and sample set are shown. d, Results obtained for 
dual luciferase-reporter assay of HES7 reporter (NanoLuc) and DKK reporter 
(Luciferase2) inhuman PSC-derived PSM. The signal was detrended (+ 2-h 
window) and normalized to the maximum oscillation peak. Representative 
graph of three independent experiments is shown. Top, schematic overview of 
reporter constructs. e, Schematic overview of 2D-oscillation and 
3D-synchronization assays. 
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Extended Data Fig. 8 | Characterization of knockout human reporter cell 
lines. a, Overview of knockout reporter cell line generation for HES7, DLL3, 
LFNGand MESP2 genes. Positions of the sgRNAs used in this study are shown. 
sgRNAs were designed to target at or near regions of known pathogenic 
mutations, particularly those resulting in frameshifts and premature 
termination. Sequence analysis of iPS cell clones used in this study indicating 
insertion or deletion mutations generated by Cas9. Predicted effects onthe 
protein sequence are listed below the sequence alignments. b, Damping rate of 
oscillation amplitude in knockout human PSMs. The signal of all area was 
averaged and detrended (+ 100-min window). See also Fig. 3d for quantification 
of shown data,n=3.c, Summary of results of oscillation and synchronization 


assays. See Fig. 3a—d for details. d, Flow cytometric evaluation of DLL1 
expression at PSM stage of healthy control and knockout human iPS cell lines. 
Blue, isotype control; red, DLL1-APC. PSM induction efficiency is high in all 
analysed samples; slight reduction of DLL1 induction efficiency in LFNG- 
knockout celllines. Representative results of three independent experiments 
of two different knockout lines for each gene are shown (HES7 KO #1and #8, 
DLL3 KO #2 and #6, LFNG KO #2 and #12, and MESP2 KO #7 and #11); n=3. 

e, Scatter plot of transcriptome analysis of wild-type and knockout cell lines at 
iPS celland PSM stages. Positions of expression values for MESP2, DLL3, LFNG 
and HES7 are highlighted with coloured arrows. Data are averages of two 
biological replicates, n=2. 
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Extended Data Fig. 9 | Overview of HES7?™ mutant cell line generation and 
initial characterization of patient iPS cell lines SCDP1 and SCDP2. 

a, Schematic overview of the stepwise HES7-targeting approach for ssODN- 
mediated recreation of HES7#?” mutant cell lines. The first round of CRISPR- 
Cas9 targeting with ssODN resulted ina compound heterozygous line with the 
desired c.73C>T base modification and a5-bp deletion (c.70_74delCGCCG). The 
c.70_74delCGCCG deletion creates anew PAM Site. In the second targeting 
step, the c.70_74delCGCCG allele was retargeted with asgRNA specific to the 
deletion, and correction with the same ssODN resulted ina homozygous 
c.73C>T iPS cell line. b, Representative bright-field views of SCDP1 (SCDP1-A 
and SCDP1-F) and SCDP2 (SCDP2-A and SCDP2-E) iPS cell clones. 
Representative data of five independent experiments are shown. Scale bar, 
500 um.c, Normal karyotype (46, XX) inboth clones of SCDP1 patient iPS cell 
line by chromosomal G-banding analysis. The data of passage 10 is shown. 

d, Expression of pluripotency markers OCT3/4 and NANOG in SCDP1and SCDP2 


clones compared with iPS cell line (201B7). Quantification of residual plasmid 
levels in SCDP1 and SCDP2 clones (right); mean value (horizontal bar) of three 
technical replicas for each of the four analysed clones are shown. e, FACS-based 
evaluation of differentiation capacity into three germ layers of healthy control 
(H9 hESC) and patient cell lines (SCDP1-A and SCDP1-F, SCDP2-A and SCDP2-E). 
Representative data of three independent experiments are shown; n=3. 

f, Quantification of differentiation capacity of healthy control and patient cell 
lines into ectoderm, mesoderm and endodermat the transcript level by 
TaqMan hPSC scorecard panel. Top, SCDP1-A and SCDP1-F; bottom, SCDP2-A 
and SCDP2-E. Same H9 hESC control data shown in both panels. Data of three 
independent experiments are shown; n=3.g, X-ray and MRI images ofa patient 
with SDV witha DLL3 mutation (donor of SCDP2iPS cell clones). Radiological 
images were obtained at Meijo Hospital, Nagoya, Japan with patient consent. 
Black bars were added to anonymize the image. See Supplementary Note 1 for 
details of clinical and radiological features of the patient. 
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Extended Data Fig. 10 | Analysis of patient and rescue iPS cell line-derived 
PSM andallele-specific gene correction of SCDP1 patient iPS cell lines. 

a, Representative DLL1 expression in iPS cells (grey) and PSMs (red) derived 
froma patient with SCD witha compound mutation in MESP2 (SCDP1-A and 
SCDP1-F), a patient with SCD with a mutation in DLL3 (SCDP2-A and SCDP2-E) 
and corresponding isogenic rescue cell lines (SCDP1-resA1, SCDP1-resF3, 
SCDP2-resA12 and SCDP2-resE17). n= 3; data for SCDP1-A and SCDP2-E are also 
used for Fig. 4b. b, Three-dimensional synchronization assay of SCDP1 patient 
PSM. Representative kymograph of three independent experiments is shown. 
c, Representative measurement of HES7 reporter activity in PSM derived from 
SCDP1 patient cell line. After the spike noise was removed, the signal of the 
entire area was averaged. The signal was further detrended and normalized to 
the average (+100-min window). d, Top, representative genotype of patients 
with SCD and iPS cells (SCDP1) with compound heterozygous mutationsin 
MESP2. Bottom, sequence of each haplotype from patient genomic DNA. Red 
triangle indicates a deletion. Black triangle indicates a single nucleotide 
variation. e, Schematic of the gene-targeting procedure for allele-specific 
correction of MESP2 mutations using MhAX. Details of the targeting and 
genotyping procedures are provideding. f, Genotype of heterozygously 
corrected iPS cell subclones. 201B7 is included as areference. Red triangle 
indicates a deletion. Black triangle indicates a single nucleotide variation. DNA 
sequencing was performed twice for each clone; n=2.g, Detailed schematic of 


gene-correction strategy of SCDP1 patient iPS cell clones. Depicted are two 
mutant or corrected MESP2alleles with coding and non-coding exons (grey and 
white), overlapping donor vector homology arms (HA-L and HA-R), engineered 
51-bp microhomology (151, blue), inverted protospacers for cassette excision 
(ps1, green), genotyping primers (red arrows) and Southern blotting probes 
(black bars). Sequences of mutation-specific sgRNAs are shown below each 
mutantallele. The gene-targeted intermediate shows details of the 
CAG::mCherry-IRES-puro cassette used for enrichment. h, Southern blot 
analysis of targeted iPS cell clones. Samples marked with an asterisk were 
selected for cassette excision. i, Southern blot analysis of gene-corrected iPS 
cell clones following selection marker removal. Samples marked with an 
asterisk were selected for phenotyping (067-1-3, SCDP1-resA1; 067-2-5, SCDP1- 
resF2; 067-3-4 and SCDP1-resF3). Southern blots showninhandiwere 
performed once for two patient and rescue clones each. For gel source dataofh 
andisee Supplementary Fig. 1.j,k, Resulting karyograms from SNP array 
analysis of SCDP1 patient iPS cell clone A (SCDP1-A) and corresponding rescued 
iPS cell line (SCDP1-resA1). I-n, Karyograms from SNP array analysis of iPS cell 
clone F (SCDP1-F) froma patient with SDV and corresponding rescued iPS cell 
lines (SCDP1-resF2/F3). No de novo CNVs were detected following gene editing 
and subcloning. These figures were created with Illumina Genome Viewer 
(v.1.9.0) on Illumina GenomeStudio v.2011.1 with Human:Build 37 genome. 
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Extended Data Fig. 11| RNA-seq analysis and RT-qPCR validation of SCDP1 induction and differentiation and for MESP2-knockout cell lines all stages 
patient and rescue samples. a, Heat map of gene expression levels of except primitive streak. For somitic mesoderm-stage data see Fig. 4d. 
transcripts differentially expressed in patient cell lines SCDP1-A and SCDPL-F, b, RT-qPCR-based validation of additional candidates found via RNA-seq to be 
when compared to wild-type (201B7) and corrected rescue clones (SCDP1-resA upregulated in SCDP1 patient cell lines at the somitic mesoderm stage. Data are 
(Al) and SCDP1-resF (F2 and F3)). Analysis covers all stages of stepwise PSM mean +s.d. from three independent experiments. 
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Extended Data Fig. 12 | Gene correction and analysis of SCDP2 patient iPS 
celllines.a, Representative genotype of cells from patients with SCD and iPS 
cells (SCDP2) with mutation in DLL3—201B7 is included asa reference. Red 
triangle indicates insertion. b, Schematic of the gene-targeting procedure for 
allele-specific correction of DLL3 mutation using MhAX. Details for the 
targeting and genotyping procedures are provided ind. The synonymous 
c.615C>G PAM blocking mutation is present only in the 3’ microhomology. 

c, Genotype of homozygously corrected iPS cell subclones (SCDP2-resA and 
SCDP2-resE). Black triangle indicates the synonymous blocking mutation. DNA 
sequencing performed twice for each clone; n=2.d, Detailed schematic of 
gene-correction strategy of SCDP2 patient iPS cell clones. Depicted are mutant 
or corrected DLL3alleles with coding and non-coding exons (grey and white), 
overlapping donor vector homology arms (HA-L and HA-R), engineered 30-bp 
microhomology (130, blue), inverted protospacers for cassette excision (ps1, 
green), genotyping primers (red arrows), and Southern blotting probes (black 


bars). The same sgRNA used to generate DLL3-knockout iPS cell lines was used 
for gene targeting. The gene-targeted intermediate shows details of the 
CAG::mCherry-/RES-puro cassette used for enrichment and FACS sorting of 
targeted cells as a population. Excision was performed without intermediate 
cloning. Owing to the c.615C/G mismatch between flanking microhomologies, 
two repair outcomes are possible. e, Southern blot analysis of gene-corrected 
iPS cell clones following selection marker removal. Samples marked with an 
asterisk were selected for further characterization, with SCDP2-resE17 and 
SCDP2-resE43 used for analysis of oscillation phenotypes (Fig. 4f, g). For gel 
source data fore, see Supplementary Fig. 1. f, HES7 reporter activity in3D 
synchronization assay of PSM derived from SCDP2 patient and isogenic rescue 
cell lines (SCDP2-resE17 and SCDP2-resE43). After the spike noise was removed, 
the signal of the entire area was averaged. The signal was further detrended and 
normalized to the average (+100-min window). Representative graphs of three 
independent experiments are shown. Seealso Fig. 4g. 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


x The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


x A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 


x 
| Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


x [J A description of all covariates tested 


x A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 

r Ol A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
4 AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 

OQ For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 

a Give P values as exact values whenever suitable. 

x For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 

x For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 

x]I[_] 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection QuantStudio 12K Flex Software (v1.2.4), BD FACSDiva software (LSR: v.8.0.1, Aria Il: v.8.0.2), Living Image (v4.0), Kronos control software 
(v2.3, Atto), Metamorph (v7.6, Molecular Devices), NIS-Elements AR (v4.20.00) 


Data analysis cutadapt-1.14, HISAT2 (v2.1.0), Cufflinks package (v2.2.1), HTSeq (v0.6.1), edgeR (v3.18.1), ARSER (v2.0), DAVID web tools (DAVID 6.8) 
(https://david.ncifcrf.gov/), IPA (QIAGEN Inc., https://www. qiagenbioinformatics.com/products/ingenuitypathway-analysis), HCS 
(v2.0.10), RTA (v1.17.21.3), CASAVA (v1.8.2), Picard (v2.0.1), ANNOVAR (v2016Mar30), Sequencher (v5.1), Snapgene (v3.1.4 to v5.0.4), 
GWASTools (v1.16.1), PennCNV (v1.0.3), cnvPartition (v3.2.0), MAD (v1.0.1), Metamorph (v7.6, Molecular Devices), Matlab (v2018b, 
MathWorks), Fiji (v1.52p, ImageJ), Excel (Microsoft Office 2011/2016), R (v3.2.5, v3.3.1, v3.4.2, v3.5.1), FlowJo (v10.6.1) 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 


All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 
- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


All RNA sequencing data utilized for this study have been deposited in Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) under the accession number 
GSE116935. SNP array data in the current publication have been deposited in and are available upon application from the dbGaP database under accession 
phs001975.v1.p1 (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001975.v1.p1) and their use is limited to health/medical/biomedical 
purposes. Utilized computational codes & scripts are available at GitHub (https://github.com/mebisuya/SegmentationClock) and from the corresponding authors 
upon request. Source Data for Figs. 1-4 and Extended Data Figs. 1, 2, 5-12 are available in the online version of the paper. 
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Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 
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For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size No statistical method was used to predetermine the sample size. All experiments were performed (if not otherwise stated) with at least two 
(most of the time three) independent experiments, yielding similar and reproducible results. 
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Data exclusions No samples/data were excluded. 


Replication Each experiment was reproduced and performed at least two times, with multiple biological and/or technical replicates if not otherwise 
stated. See figure legends and methods section for details. 


Randomization — No particular randomization method was utilized. Animals used for experiments were randomly allocated. 


Blinding The investigators were not blinded during data collection and analysis. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
[x] Antibodies x ChIP-seq 
[x Eukaryotic cell lines x | Flow cytometry 
x Palaeontology x MRI-based neuroimaging 


[x] Animals and other organisms 


[x Human research participants 


x]|[_] Clinical data 


Antibodies 


Antibodies used All antibodies used in this study are commercially available antibodies, which were validated by suppliers and other researchers 
in the field. Details on used antibodies are listed in Supplementary Table 8. 


Primary antibodies used for immunocytochemistry 

BRACHYURY, R&D Systems, #:AF2085, Lot: KQP0618021, Polyclonal Goat IgG, Dilution 1:100 
COL1, Southern Biotech, #:1310-01, Lot: B2918-T858, Polyclonal Goat IgG, Dilution 1:200 
COL2, Southern Biotech, #:1320-01, Lot: JO513-S328, Polyclonal Goat IgG, Dilution 1:200 
FOXC2, DSHB, #:1B6, Lot: 8/17/17, Monoclonal Mouse IgG1, Dilution 1:10 

HNA, Merck, #:MAB1281, Lot: 2366521, Clone: 235-1, Dilution 1:50 

MESP2, DSHB, #:1D4, Lot: 6/1/17, Monoclonal Mouse IgG2b, Dilution 1:10 

MYH, Abcam, #:ab91506, Lot: GR11678-3, Polyclonal Rabbit IgG, Dilution 1:2000 
MYOSIN, DSHB, #:MF20-s, Lot: 12/7/17, Monoclonal Mouse IgG2b, Dilution 1:20 

PAX7, DSHB, #:PAX7-s, Lot: 3/8/18, Monoclonal Mouse |gG1, Dilution 1:10 

PRRX1, Sigma-Aldrich, #:HPA051084, Lot: G114643, Polyclonal Rabbit IgG, Dilution 1:100 
SAA, Abcam, #:ab9465, Lot: GR266197-2, Clone: EA-53, Dilution 1:1000 

TBX6, R&D Systems, #:AF4744, Lot: CAPTO217111, Polyclonal Goat IgG, Dilution 1:100 
TCF15, Abcam, #:ab204045, Lot: GR268168-3, Polyclonal Rabbit IgG, Dilution 1:50 


20 
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Secondary antibodies used for immunocytochemistry 
Alexa Fluor® 488 Donkey Anti-Rabbit IgG (H+L), Invitrogen, #:A-21206, Lot: 2072687, Dilution 1:500 
Alexa Fluor® 488 Goat Anti-Mouse IgG (H+L) Invitrogen, #:A-10680, Lot: 1917945, Dilution 1:500 


Validation 


Alexa Fluor® 555 Donkey Anti-Goat IgG H&L Abcam, #:ab150130, Lot: GR226396-1, Dilution 1:500 
Alexa Fluor® 555 Goat Anti-Mouse IgG (H+L) Invitrogen, #:A-21422, Lot: 1180091, Dilution 1:500 
Alexa Fluor® 647 Donkey Anti-Mouse IgG (H+L) Invitrogen, #:A-31571, Lot: 2045337, Dilution 1:500 
Donkey Anti-Rabbit IgG Cy3 Merck, #:AP182C, Lot: 2984232, Dilution 1:500 

Alexa Fluor® 647 Phalloidin Invitrogen, #:A-2228, Lot: 2101947, Dilution 1:100 


Primary antibodies used for flowcytometry 

BRACHYURY-PE, R&D Systems, #:1C2085P, Lot: LVBO213071, Polyclonal Goat IgG Dilution 1:50 
DLL1-APC, R&D Systems, #:FAB1818A, Lot: AAYU0216061, Clone: 251127, Dilution 1:200 
FOXA2-PE, BD Biosciences, #:561589, Lot: 7090765, Clone: N17-280 , Dilution 1:50 

ANOG-Alexa Fluor® 488, BD Biosciences, #:560791, Lot: 7110923, Clone: N31-355 Dilution 1:25 
CAM-BV421, BioLegend, #:318328, Lot: B241630, Clone: HCD56, Dilution 1:25 

OCT3/4-Alexa Fluor® 647, BD Biosciences, #:560329, Lot: 7201923, Clone: 40/Oct3, Dilution 1:25 
PAX6-Alexa Fluor® 488, BD Biosciences, #:561664, Lot: 7174912, Clone: 018-1330, Dilution 1:25 
SOX2-BV421, BioLegend, #:656114, Lot: B208882, Clone: 14A6A34, Dilution 1:50 

SOX17-Alexa Fluor® 647, BD Biosciences, #:562594, Lot: 7104800, Clone: P7-969, Dilution 1:25 
Anti-TBX6, R&D Systems, #:AF4744, Lot: CAPTO217111, Polyclonal Goat IgG, Dilution 1:25 


Secondary antibodies & isotype controls used for flowcytometry 

Alexa Fluor® 488 Anti-Goat IgG, Abcam, #:ab150129, Lot: GR246088-1, Polyclonal Goat IgG, Dilution 1:50 
APC-conjugated Mouse |gG2b,k, BD Biosciences, #:555745, Lot: B163785, Clone: MG2b-57, Dilution 1:200 
PE-conjugated Goat IgG, R&D Systems, #:IC108P, Lot: LVD0811021, Polyclonal Goat IgG, Dilution 1:50 
Unconjugated Goat IgG, R&D Systems, #:AB108C, Lot: ES4119031, Polyclonal Goat IgG, Dilution 1:25 

Alexa Fluor® 647-conjugated Mouse IgG1,k, BioLegend, #:400130, Lot: B205347, Clone: MOPC-21, Dilution 1:100 
Alexa Fluor® 488-conjugated Mouse IgG1,k, BioLegend, #:400129, Lot: B277964, Clone: MOPC-21, Dilution 1:25 
Alexa Fluor® 488-conjugated Mouse lgG2a,k, BioLegend, #:400233, Lot: B286502, Clone: MOPC-173, Dilution 1:25 
PE-conjugated Mouse IgG1,k, BioLegend, #:400112, Lot: B244597, Clone: MOPC-21, Dilution 1:50 
BV421-conjugated Mouse IgG1,k, BioLegend, #:400158, Lot: B243837, Clone: MOPC-21, Dilution 1:50 


Besides initial validation of utilized (commercial) primary antibodies by manufacturers/suppliers, antibodies were validated/ 
tested for possible signals at unrelated stages/controls; only antibodies with reproducible (differentiation stage) specific signals 
were used. 


Primary antibodies used for immunocytochemistry 


BRACHYURY, R&D Systems, AF2085 (https://www.rndsystems.com/products/human-mouse-brachyury-antibody_af2085) 
Anti-BRACHYURY antibody was validated by the manufacturer by using human embryonic stem cells. There are 60 citations. 


COL1, Southern Biotech, 1310-01 (https://www.southernbiotech.com/?catno=1310-01&type=Polyclonal#&panel1-5&panel2-1) 
Anti-COL1 antibody was validated by the manufacturer by using rat kidney section postuninephrectomy. There are 35 citations. 


COL2, Southern Biotech, 1320-01 (https://www.southernbiotech.com/?catno=1320-01&type=Polyclonal#&panel1-1&panel2-1) 


Anti-COL2 antibody was validated by the manufacturer by using newborn mouse rib section, mouse tibial growth plate section, 
and mouse cartilage section. There are 34 citations. 


FOXC2, DSHB, 1B6 (https://dshb.biology.uiowa.edu/PCRP-FOXC2-1B6) 
Anti-FOXC2 antibody was validated by the manufacturer by using human samples. There is 1 citation. 


ak 


A, Merck, MAB1281 (http://www.merckmillipore.com/JP/en/product/Anti-Nuclei-Antibody-clone-235-1, MM_NF-MAB1281) 
Anti-HNA antibody was validated by the manufacturer by using using human neural stem cells. There are 126 citations. 


ESP2, DSHB, 1D4 (https://dshb.biology.uiowa.edu/PCRP-MESP2-1D4) 
Anti-MESP2 antibody was validated by the manufacturer by using human samples. There is 1 citation. 


YH, Abcam, ab91506 (https://www.abcam.co.jp/fast-myosin-skeletal-heavy-chain-antibody-ab91506.html) 
Anti-MYH antibody was validated by the manufacturer by using sheep muscle tissue frozen section. There are 24 citations. 


YOSIN, DSHB, MF20-s (https://dshb.biology.uiowa.edu/MF-20) 


Anti-MYOSIN antibody was validated by the manufacturer by using samples of amphibian, avian, chicken, fish, human, lizard, 
mammal, snake, xenopus, zebrafish. There are 121 citations. 


PAX7, DSHB, PAX7-s (https://dshb.biology.uiowa.edu/PAX7) 


Anti-PAX7 antibody was validated by the manufacturer by using samples of amphibian, avian, bovine, canine, fish, goat, human, 
mouse, ovine, porcine, rat, turtle, xenopus, zebrafish. There are 73 citations. 


PRRX1, Sigma-Aldrich, HPAO51084 (https://www.sigmaaldrich.com/catalog/product/sigma/hpa051084?lang=en&region=US) 
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Anti-PRRX1 antibody was validated by the manufacturer by using human malignant glioma. There are 4 citations. 


SAA, Abcam, ab9465 (https://www.abcam.co.jp/sarcomeric-alpha-actinin-antibody-ea-53-ab9465.html) 


Anti-SAA antibody was validated by the manufacturer by using mouse heart tissue, H9 hESC and CBiPSC6.2 cells. There are more 
than 100 citations. 


TBX6, R&D Systems, AF4744 (https://www.rndsystems.com/products/human-tbx6-antibody_af4744) 


Anti-TBX6 antibody was validated by the manufacturer by using embryonic mouse mesoderm (E9.5) and JOY6 human induced 
pluripotent stem cells undifferentiated and differentiated into mesoderm. There are 3 citations. 


TCF15, Abcam, ab204045 (webpage is closed due to discontinuation of antibody.) 
Anti-TCF15 antibody was validated by the manufacturer by using human lateral ventricle tissue. 


Antibodies used for flowcytometric analysis 


BRACHYURY-PE, R&D Systems, IC2085P (https://www.rndsystems.com/products/human-mouse-brachyury-pe-conjugated- 
antibody_ic2085p) 


BRACHYURY-PE was validated by the manufacturer by using D3 mouse cell line by flowcytometry. There are 3 citations. 
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DLL1-APC, R&D Systems, FAB1818A (https://www.rndsystems.com/products/human-dll1-apc-conjugated- 
antibody-251127_fab1818a) 

DLL1-APC was validated by the manufacturer by using T98G human glioblastoma cell line by flowcytometry. There are 13 
citations. 


FOXA2-PE, BD Biosciences, 561589 (https://www.bdbiosciences.com/us/applications/research/intracellular-flow/intracellular- 
antibodies-and-isotype-controls/anti-human-antibodies/pe-mouse-anti-human-foxa2-n17-280/p/561589) 


FOXA2-PE was validated by the manufacturer by using definitive endoderm derived from H9 human embryonic stem (ES) cells. 
There are 6 citations. 


ANOG-Alexa Fluor® 488, BD Biosciences, 560791 (https://www.bdbiosciences.com/eu/applications/research/intracellular-flow/ 
intracellular-antibodies-and-isotype-controls/anti-human-antibodies/alexa-fluor-488-mouse-anti-human-nanog-n31-355/ 
p/560791) 


ANOG-Alexa Fluor® 488 was validated by the manufacturer by using H9 human embryonic stem cells. There are 7 citations. 


CAM-BV421, BioLegend, 318328 (https://www.biolegend.com/en-us/products/brilliant-violet-42 1-anti-human-cd56-ncam- 
antibody-7143) 


CAM-BV421 was validated by the manufacturer by using Human peripheral blood lymphocytes. There are 13 citations. 


OCT3/4-Alexa Fluor® 647, BD Biosciences, 560329 (https://www.bdbiosciences.com/us/applications/research/stem-cell- 
research/cancer-research/human/alexa-fluor-647-mouse-anti-oct34-40oct-3/p/560329) 


OCT3/4-Alexa Fluor® 647 was validated by the manufacturer by using H9 human embryonic stem (ES) cells. There are 6 citations. 


PAX6-Alexa Fluor® 488 BD Biosciences, 561664 (https://www.bdbiosciences.com/us/applications/research/intracellular-flow/ 
intracellular-antibodies-and-isotype-controls/anti-human-antibodies/alexa-fluor-488-mouse-anti-human-pax-6-018-1330/ 
p/561664) 

PAX6-Alexa Fluor® 488 was validated by the manufacturer by using neural induction of HS human embryonic stem (ES) cells. 
There are 4 citations. 


SOX2-BV421, BioLegend, 656114 (https://www.biolegend.com/en-us/search-results/brilliant-violet-421-anti-sox2- 
antibody-12705) 


SOX2-BV421 was validated by the manufacturer by using NCCIT cells. There are 6 citations. 


SOX17-Alexa Fluor® 647, BD Biosciences, 562594 (https://www.bdbiosciences.com/us/applications/research/intracellular-flow/ 
intracellular-antibodies-and-isotype-controls/anti-human-antibodies/alexa-fluor-647-mouse-anti-human-sox17-p7-969/ 
p/562594) 

SOX17-Alexa Fluor® 647 was validated by the manufacturer by using definitive endoderm derived from H9 human embryonic 
stem (ES) cells. There are 5 citations. 


TBX6, R&D Systems, AF4744 (https://www.rndsystems.com/products/human-tbx6-antibody_af4744) 


Anti-TBX6 was validated by the manufacturer by using embryonic mouse mesoderm (E9.5) and JOY6 human induced pluripotent 
stem cells undifferentiated and differentiated into mesoderm. There are 3 citations. 
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Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) Human induced pluripotent stem (iPS) cell lines derived from healthy donors, i.e. 1231A3 (derived from commercially 
available peripheral blood) and 201B7 (derived from commercially available human fibroblasts) were used for majority of 


experiments and obtained from/provided by the Center for iPS Cell Research and Application (CiRA). 

Additionally, human luciferase iPSC-reporter lines (625-A4 and 625-D4), which were utilized for xeno-transplantation 
experiments, were obtained from the Center for iPS Cell Research and Application (CiRA). 

Human iPSC (GCaMP) reporter line (Gen1C), which was used for calcium imaging experiments, was originally established by 
the Conklin Lab at the Gladstone Institute and shared with/provided by the Center for iPS Cell Research and Application 
(CiRA). 
SCDP1 patient sample was obtained from the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical 
Research (GM13539) and SCDP2 patient sample was obtained from a collaborating researcher at Meijo Hospital, Nagoya, 
Japan. Patient-derived iPSC lines were generated following strict guidelines of and approval by Kyoto University Graduate 
School and Medical Faculty, and Meijo Hospital, Japan. 

Mouse Epiblast Stem Cells (EpiSCs) were obtained from the RIKEN BioResource Research Center (RIKEN BRC) (#AESO204). 


Authentication Identity of cells generated/utilized in the lab/institute are commonly confirmed by multiple STR analyses using PowerPlex 16 
HS System (Promega). For SCD patient and rescue iPSC clones SNP array analysis was also performed (see Methods section 
for details). Patient-like (knock-out) and patient-derived iPSC lines were also tested for presence of line-specific mutations via 
iPSC genotyping (see Methods section for details). 


Mycoplasma contamination All cell lines were tested negative for mycoplasma infection. 


Commonly misidentified lines No commonly misidentified lines were used. 
(See ICLAC register) 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals NOD/ShiJic-scid Jcl (NOD/SCID) male mice were purchased from CLEA Japan and utilized for experiments at six weeks of age. 

Wild animals No wild animals were used. 

Field-collected samples No field-collected samples were used. 

Ethics oversight Animal experiments were approved by the institutional animal research committee of the Center for iPS Cell Research and 
Application (CiRA), Kyoto University and performed following the guidance of Regulation on Animal Experimentation at Kyoto 
University 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics SCDP1 | Patient with mutations in MESP2; M-SDV-G (SCD2) 


SCDP2 | Patient with mutation in DLL3; M-SDV-G (SCD1) 


Recruitment Patient samples utilized in this study were either obtained from the NIGMS Human Genetic Cell Repository at the Coriell Institute 
for Medical Research (SCDP1) or Meijo Hospital, Japan (SCDP2) based on provided clinical/radiological data indicating a patient 
with segmentation defects of the vertebrae (SDV). 


SCDP1 | Patient with mutations in MESP2; M-SDV-G (SCD2): 
Patient was diagnosed with spondylothoracic dysostosis, malsegmentation of the spine, numerous hemivertebrae, "crab thorax" 


and lordosis. Primary tissue samples utilized for derivation of patient iPSCs (SCDP1) were provided by Coriell Institute for Medical 
Research (GM13539). 


SCDP2 | Patient with mutation in DLL3; M-SDV-G (SCD1): 

Patient was diagnosed with spondylothoracic dysostosis, segmentation defects of the vertebrae with involvement of the entire 
spine (sacrum to C1), bilateral fusion of ribs posteriorly, with fanning out in a “crab-like” appearance, mild scoliosis and marked 
reduction of thoracic lordosis. Primary tissue samples utilized for derivation of patient iPSCs (SCDP2) were provided by Meijo 
Hospital, Nogoya, Japan. 


Ethics oversight All experiments followed relevant guidelines and regulations and were approved by Ethics committees of the Kyoto University 
Graduate School and Faculty of Medicine, Kyoto, Japan and Meijo Hospital, Nagoya, Japan. Informed consent was obtained from 
egal guardians of patients by relevant institutions. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Flow Cytometry 


Plots 


Confirm that: 


x | The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 


x | The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers). 


All plots are contour plots with outliers or pseudocolor plots. 


x | A numerical value for number of cells or percentage (with statistics) is provided. 


Methodology 


Sample preparation Cells were washed with PBS and dissociated using Accutase (Life Technologies) and centrifuged. Cells were resuspended (1.0 x 
107 cells/ml) in FACS buffer (0.1% BSA in PBS) and stained with allophycocyanin (APC)-conjugated DLL1 antibody for 30 minutes 
at 4°C. Then, cells were stained with DAPI to eliminate dead cells after washing with FACS buffer once and finally strained 
through a filter mesh. As for the co-staining of intracellular molecules TBX6 and BRACHYURY with DLL1, cells were fixed with 4% 
paraformaldehyde (PFA) for 20 minutes at 4°C after initial staining with DLL1 antibody and washed twice with staining medium, 
which contained PBS with 2% fetal bovine serum (FBS). Samples were permeabilized with BD Perm/Wash buffer (BD Biosciences) 
for 15 minutes at room temperature and stained with TBX6 primary antibody or phycoerythrin (PE)-conjugated BRACHYURY 
antibody for 60 minutes at room temperature and washed with BD Perm/Wash buffer twice. The cells stained with TBX6 
antibody were stained with Alexa Fluor® 488-conjugated secondary antibody for 60 minutes at room temperature. The samples 
were washed with BD Perm/Wash buffer twice and suspended into staining medium. For FACS-based evaluation of 
undifferentiated PSCs and their differentiation capacity into three germ layers, cells (1.0 x 10*6 cells each) were fixed with 4% 
paraformaldehyde phosphate buffer solution (4% PFA/PBS) for 20 minutes at 4°C and washed twice with staining medium, which 
contained PBS with 2% fetal bovine serum (FBS). Samples were permeabilized with BD Perm/Wash buffer (BD Biosciences) for 15 
minutes at room temperature and stained with fluorescence-conjugated primary antibodies listed in Supplementary Table 8.3. 
The samples were washed with BD Perm/Wash buffer twice and suspended into staining medium. 
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Instrument Flow cytometric analysis was performed using LSR or BD FACSAria II cell sorter (BD Biosciences). 
Software FACS data was analyzed and graphs were generated using FlowJo software (FlowJo LLC, version 10.6.1). 
Cell population abundance — Abundance of distinct cell populations of interest was determined using appropriate negative controls. 


Gating strategy Standard gating settings commonly utilized at FACS core facility of the institute were used. Besides using appropriate isotype 
controls, negative (control) cell samples were utilized to set appropriate gates and determine true positive cell populations. 


Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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Caspase-dependent apoptosis accounts for approximately 90% of homeostatic cell 
turnover in the body’, and regulates inflammation, cell proliferation, and tissue 
regeneration’ *. How apoptotic cells mediate such diverse effects is not fully 
understood. Here we profiled the apoptotic metabolite secretome and determined its 
effects on the tissue neighbourhood. We show that apoptotic lymphocytes and 
macrophages release specific metabolites, while retaining their membrane integrity. 
Asubset of these metabolites is also shared across different primary cells and cell lines 


after the induction of apoptosis by different stimuli. Mechanistically, the apoptotic 
metabolite secretome is not simply due to passive emptying of cellular contents and 
instead is aregulated process. Caspase-mediated opening of pannexin1 channels at 
the plasma membrane facilitated the release of a select subset of metabolites. In 
addition, certain metabolic pathways continued to remain active during apoptosis, 
with the release of only select metabolites froma given pathway. Functionally, the 
apoptotic metabolite secretome induced specific gene programs in healthy 
neighbouring cells, including suppression of inflammation, cell proliferation, and 
wound healing. Furthermore, a cocktail of apoptotic metabolites reduced disease 
severity in mouse models of inflammatory arthritis and lung-graft rejection. These 
data advance the concept that apoptotic cells are not inert cells waiting for removal, 
but instead release metabolites as ‘good-bye’ signals to actively modulate outcomes 


in tissues. 


Apoptosis occurs during development’, homeostatic tissue turno- 
ver, and in pathological settings!. Besides the known responses of 
phagocytes that engulf apoptotic cells*, the apoptotic process itself 
(independent of phagocytosis) can modulate physiological events, 
such as embryogenesis and tissue regeneration’, with pathologies 
arising when apoptosis is inhibited®. However, the mechanisms by 
which apoptotic cells themselves mediate these functions are not well 
understood. As apoptotic cells remain intact for a period of time, they 
could release soluble metabolites that diffuse within a tissue to influ- 
ence neighbouring cells. Although a few soluble factors from apoptotic 
cells are reported as ‘find-me’ signals to attract phagocytes’, the full 
apoptotic secretome is not yet defined. 

To profile the metabolite secretome of apoptotic cells, we used 
human Jurkat T cells, primary mouse thymocytes, or primary mouse 
bone-marrow-derived macrophages (BMDMs), all of which can undergo 
inducible, caspase-dependent apoptosis (caused by ultraviolet (UV)- 
light treatment, anti-Fas antibody crosslinking, or treatment with 
anthrax lethal toxin)*° (Fig. 1a). As untargeted metabolomics require 
large numbers of cells, we optimized the parameters using Jurkat cells 
(for example, cell density, culture volume and duration after apopto- 
sis), such that approximately 80% of the cells were apoptotic, while 


maintaining cell membrane integrity (annexin V’7AAD ) (Extended Data 
Fig. 1a, b). Supernatants and cell pellets from apoptotic cells and live cell 
controls were subjected to untargeted metabolomic profiling against a 
library of more than 3,000 biochemical features or compounds. Super- 
natants of apoptotic Jurkat cells (induced by UV irradiation) showed 
an enrichment of 123 metabolites (Fig. 1b, Extended Data Fig. 1c, d, 
Supplementary Table 1), and 85 of these 123 were reciprocally reduced 
inthe apoptotic cell pellets (Extended Data Fig. 2a-f, Supplementary 
Table 2). 

In untargeted metabolomics of supernatants from macrophages 
undergoing apoptosis (induced via anthrax lethal toxin’), we detected 
fewer metabolites (20 versus 123 in Jurkat cells), perhaps owing to 
differences in celltypes, modality of death and/or quantities released 
(below detection limits). Notably, 16 of the 20 metabolites (80%) were 
shared with apoptotic Jurkat cells (Fig. 1b). 

For further validation and quantification, we performed targeted 
metabolomics and analysed 116 specific metabolites (Methods) inthe 
supernatants from Jurkat cells and primary mouse thymocytes after 
Fas-crosslinking (extrinsic cue for apoptosis) (Supplementary Table 3). 
This targeted panel included 43 of the metabolites released from apop- 
toticJurkat cells (identified above), and included a5-kDa filtering step 
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Fig. 1| Conserved metabolite secretome from apoptotic cells. a, Schematic 
for assessing apoptotic metabolite secretomes. b, Venn diagrams illustrating 
the shared apoptotic metabolites identified across cell types, modalities of 
apoptosis induction, and the two metabolomic platforms tested, and the list of 
five shared metabolites plus ATP. G3P, glycerol-3-phosphate. c—e, Metabolite 
release fromJurkat T cells (n=3 for ATP-UV, spermidine-UV + zVAD, 
spermidine-ABT, and spermidine-Fas; n= 4 for ATP-ABT, ATP-Fas and 
spermidine-Fas-live; n=5 for spermidine-UV-live and spermidine- 


to exclude proteins and extracellular vesicles. This targeted analysis 
showed an enrichment of many metabolites seen with UV-induced 
apoptosis (Fig. 1b). Furthermore, metabolites released from apoptotic 
primary thymocytes overlapped with apoptotic Jurkat cells (Fig. 1b). 
Comparison of metabolites enriched or released in the apoptotic super- 
natant of Jurkat cells, thymocytes and macrophages (after Fas-, UV- or 
toxin-mediated apoptosis) identified five conserved metabolites: aden- 
osine monophosphate (AMP), guanosine 5’-monophosphate (GMP), 
creatine, spermidine and glycerol-3-phosphate (Fig. 1b, Extended Data 
Fig. 3a). ATP represents the sixth shared metabolite (via luciferase 
assay) (Extended Data Fig. 3b), although ATP was not profiled in the 
metabolomic analyses for technical reasons. 

To test other cell types and apoptotic modalities, we analysed the 
release of four conserved metabolites via analytical kits. Jurkat cells, 
A549 lung epithelial cells and HCT116 colonic epithelial cells were 
induced to undergo death via different apoptotic cues, such as UV 
radiation, treatment with the BH3-mimetic ABT-737 (which directly 
induces permeabilization of the mitochondrial outer membrane), 
and/or treatment with TRAIL (the cell extrinsic pathway) (Fig. Ic-e). 
We could readily detect apoptosis-dependent release of the tested 
metabolites, and attenuation by pan-caspase inhibitor zVAD (Fig. Ic-e, 
Extended Data Fig. 3c). The metabolites detected were not dueto simple 
emptying of cellular contents during apoptosis, as many metabolites 
at high intracellular concentrations were not released (Fig. 1f). These 
data reveal apoptotic cells as a natural source of many metabolites 
with biological functions. 

During the above analyses, we noted that despite the many cel- 
lular metabolites detected in the pellet only a subset is released; 
furthermore, even within a known metabolic pathway, only some were 
released. Such selectivity could arise from specific channels that open 
during apoptosis to permeate certain metabolites, and/or continued 
metabolic activity within the dying cell influencing the secretome. To 
test specific channels, we focused on pannexin 1 (PANX1) channels 


Fas + zVAD) (c), A549 lung epithelial cells (n =3) (d), and HCT-116 colonic 
epithelial cells (n =3) (e) across different apoptotic stimuli, with or without 
inhibition of caspase using zVAD. f, Several abundant metabolites such as 
alanine (top), pyruvate (middle) and creatinine (bottom) were not released in 
theJurkat T cell supernatants (n= 4). AC, apoptotic cell. *P<0.05, **P< 0.01, 
***P< 0.001, ****P< 0.0001, unpaired Student’s t-test with Holm-Sidak method 
for multiple t-tests. Data are mean +s.e.m. (c—e) or mean+s.d. (for f). 


that are activated during apoptosis by caspase-mediated cleavage” 
and can conduct ions and small molecules up to 1 kDa in size across 
the plasma membrane. In a PANX1-dependent manner”, apoptotic 
cells (but not live cells) take up the nucleic acid stain TO-PRO-3 dye 
(671 Da), whereas 7AAD (1.27 kDa) is excluded (Extended Data Fig. 4a, 
b). We tested the relevance of PANX1 by genetic and pharmacological 
approaches. Genetically, we used Jurkat cells expressing a dominant- 
negative PANX] with a mutation in the caspase cleavage site’® (PANX1- 
DN) or primary thymocytes from PANX1-deficient mice (PanxI” )". 
We also used two pharmacological inhibitors, trovafloxacin (Trovan) 
and spironolactone, which had previously been identified in unbiased 
screens””, Disrupting PANX1 activity per se did not affect apoptosis 
(Extended Data Fig. Sa—e). Untargeted metabolomics of the superna- 
tants from apoptotic Jurkat cells (UV-induced) with and without PANX1 
inhibition revealed that PANX1 contributed to release approximately 
20% of the apoptotic metabolites (25 out of 123) (Fig. 2a, Extended 
Data Fig. 6a). The PANX1-dependent metabolites included nucleo- 
tides, nucleotide-sugars, and metabolites linked to energy metabolism 
and amino acid metabolism; notably, most have not previously been 
reported to permeate through PANXI1. A similar PANX1-dependent 
metabolite signature was shared between Jurkat cells and thymocytes; 
furthermore, as not all apoptotic metabolites released were PANX1- 
dependent, other mechanisms of metabolite release from apoptotic 
cells must also exist (Extended Data Fig. 6b-e). We noted eight shared 
PANX1-dependent apoptotic metabolites between Jurkat cells and 
primary thymocytes (Fig. 2b, Extended Data Fig. 7). 

To test whether the apoptotic secretome might also be influenced 
by the metabolic activity within the dying cell, we chose the polyam- 
ine pathway for several reasons. First, the polyamine spermidine was 
released in considerable quantities from apoptotic Jurkat cells, mac- 
rophages, thymocytes and epithelial cells after different modes of 
apoptosis induction (Fig. 2c). Second, among the two metabolites 
immediately upstream of spermidine, putrescine was not released, 
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Fig. 2| Activation of PANX1and continued metabolic activity of dying cells 
orchestrates metabolite release. a, PANX1-dependent metabolite release. 
Heat map produced from untargeted metabolomics of supernatants from 
Jurkat T cells representing the metabolites that were statistically enriched or 
reduced (P< 0.05, two-sided Welch’s two-sample ¢-test) in the apoptotic 
supernatants relative to live supernatants, and after inhibition of PANX1 by 
PANX1-DN or the PANX1inhibitors spironolactone (Spiro.) or trovafloxacin 
(Trovan). Metabolites are grouped by pathway. Charge and relative sizes of 

the metabolites are also shown (n=4). 4-HPPA, 4-hydroxyphenylpyruvic 

acid; aKG, a-ketoglutarate; GalNAc, N-acetylgalactosamine; GIcNAc, 
N-acetylglucosamine; PEP, phosphoenolpyruvate. b, Three-way Venn diagram 
(left) illustrating the eight PANX1-dependent apoptotic metabolites observed 
(right) among the cell types and apoptotic modalities tested. ATP (not detected 
here) represents the ninth metabolite. c, Spermidine concentration per million 
cells insupernatants from targeted metabolomics inJurkat cells (after 4h Fas 


whereas ornithine was present comparably in live and apoptotic cell 
supernatants (Fig. 2d). Third, although exogenous supplementation 
of spermidine can reduce inflammation and improve longevity”, sper- 
midine release from apoptotic cells provides the first natural or physi- 
ological extracellular source of this polyamine. 

The upstream steps of spermidine generation involve arginine to orni- 
thine to putrescine to spermidine, with each conversion regulated by 
specific enzymes. A recent report" has shown that although most MRNA 
gets degraded in apoptotic HCT-116 cells, asmall fraction is retained. In 
our re-analysis of this mRNA dataset, the polyamine pathway enzyme 
transcripts were not degraded during apoptosis, including spermidine 
synthase (SRM) that converts putrescine to spermidine” (Extended 
Data Fig. 8a). We confirmed that in apoptotic Jurkat cells, the mRNA 
for spermidine synthase (SRM) was retained (Extended Data Fig. 8b). To 
address this more directly by metabolic flux labelling, we added medium 
containing [“C]arginine to Jurkat cells immediately before the induction 
of apoptosis, and traced incorporation of the label into putrescine and 
spermidine for the next few hours (Fig. 2e). Apoptotic cells displayed 
increased incorporation of the °C label into the polyamine pathway 
in the first hour, compared with live cells. After normalizing for total 
label incorporation and focusing onthe carbons within the polyamine 
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crosslinking) (left) (n= 3) or primary thymocytes with Panx] deletion (after 
1.5h Fas crosslinking) (right) (n= 3). ***P= 0.0002, ****P= 0.0001, one-way 
analysis of variance (ANOVA) with Turkey’s multiple comparison test. ND, not 
determined; No Tx, notreatment.d, Left, schematic of the polyamine 
metabolic pathway. Right, relative amounts of ornithine (top), putrescine 
(middle) and spermidine (bottom) in supernatants of Jurkat T cells in live and 
apoptotic conditions, with or without PANX1 inhibition (n= 4). ****P=0.0001, 
one-way ANOVA with Turkey’s multiple comparison test. e, f, Active polyamine 
metabolic activity during apoptosis. Experimental layout for [°C]arginine 
labelling (e), and incorporation of °C-labelled arginine into the polyamine 
pathway intermediates putrescine (f, left) or spermidine (f, right) after the 
induction of cell death (n= 6). MS, mass spectrometry. *P= 0.025, ***P=0.0003, 
unpaired Student’s t-test with Holm-Sidak method for multiple t-tests. Data are 
mean +s.e.m. 


pathway (Methods), apoptotic cells showed 40% and 25% greater incor- 
poration of the °C label per minute into putrescine and spermidine, 
respectively, during the first hour (Fig. 2f). Although this dips during the 
second hour, it was still comparable to live cells. In addition, °C-labelled 
spermidine was detectable in the supernatants of apoptotic cells, and 
this was partially reduced by the inhibition of caspases (Extended Data 
Fig. 8c). Notably, despite its active generation (revealed by “C-labelling 
analysis), putrescine was not detected in apoptotic cell supernatants 
fromJurkat cells (or inthe macrophage or thymocytes dataset) (Fig. 2d). 
Thus, apoptotic cells orchestrate the generation and release of select 
metabolites at least at two levels: caspase-dependent opening of specific 
channels (PANX1) and continued metabolic activity of certain pathways. 

To test whether released metabolites derived from apoptotic 
cells signal to alter gene expression programs in healthy nearby 
cells such as phagocytes, we added supernatants from live or apop- 
totic Jurkat cells (same conditions as untargeted metabolomics) to 
phagocytic LR73 cells—a Chinese hamster ovary cell line that is use- 
ful for determining mechanisms or responses after efferocytosis® ” 
(Fig. 3a). RNA sequencing (RNA-seq) analysis of LR73 cells (after 4 h) 
indicated distinct transcriptional changes (Fig. 3b, Extended Data 
Fig. 9a). Pathway analysis, by curating each of the hits individually, 
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Fig. 3| Metabolites from apoptotic cells influence gene programs in live 
cells. a, Schematic for assessing gene induction by apoptotic cell supernatants 
in LR73 cells. b, Gene expression programs induced in phagocytes by the 
apoptotic secretome. Display shows the differentially regulated genes (1,852 
total, 886 upregulated, 966 downregulated), categorized per known or 
predicted function(s), literature and sequence similarity. Circle size is 
proportional to the number of differentially expressed genes (n= 4) (P< 0.05). 
OXPHOS, oxidative phosphorylation; UPR, unfolded protein response.c, 


together with commonly used analysis software, revealed that the 
apoptotic secretome altered gene programs linked to cytoskeletal 
rearrangements, inflammation, wound healing or tissue repair, anti- 
apoptotic functions, metabolism and the regulation of cell size within 
the phagocyte (Fig. 3b), providing a molecular and metabolic basis for 
how apoptosis may influence essential tissue processes. 

By comparing gene programs induced in live cells by supernatants 
from apoptotic cells versus conditions with genetic inhibition of PANX1, 
we identified 110 genes as differentially regulated on phagocytes by 
PANX1-dependent apoptotic metabolites (82 up and 28 down) (Fig. 3c); 
these include genes involved in anti-inflammatory processes, anti- 
apoptotic pathways, metabolism, and actin rearrangement (Fig. 3c). 
Secondary validation via quantitative PCR (qPCR) indicates that PANX1- 
dependent metabolites can alter genes linked to anti-inflammatory 
roles in phagocytes (Nr4al and Pbx1)'*”, wound healing (Areg and 
Ptgs2)?°71, and metabolism (Sic14a1, Sgk1 and Uap1)*” (Fig. 3d, 
Extended Data Fig. 9b). Furthermore, filtration of supernatants through 
3-kDa filters before the addition to phagocytes showed similar changes 
in gene transcription (Extended Data Fig. 9c), ruling out larger proteins 
or vesicles from dying cells. Thus, metabolites released from apoptotic 
cells, asubset of which is released ina PANX1-dependent manner, can 
alter selective gene programs in the surrounding cells that sense these 
metabolic signals. 

To test whether apoptotic PANX1-dependent metabolites can induce 
gene expression changes in tissue phagocytes in vivo, we used Panxl™" 
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Differentially regulated genes in phagocytes in response to apoptotic cell 
supernatants with or without inhibition of the PANX1 channel (82 upregulated, 
28 downregulated) (n=4).d, Validation of genes regulated by PANX1- 
dependent metabolites. LR73 cells were incubated with indicated 
supernatants for 4h, and expression of Areg (n=7), Nr4al (n=7), Uap1(n=4), 
and Pbx1 (n=5) was determined in phagocytes by qPCR. AU, arbitrary units. 
*P=0.014, **P=0.009, ***P=0.0008, ****P= 0.0001, one-way ANOVA with 
Turkey’s multiple comparison test. Dataare mean+s.e.m. 


Cd4-cre mice", in which Panx1 is targeted for deletion only within the 
thymocytes and not the thymic myeloid cells (Extended Data Fig. 10a, 
left). After confirming that Panx1 was not deleted in the macrophages 
and dendritic cells (Extended Data Fig. 10a, right), and that comparable 
dexamethasone-induced thymocyte apoptosis occurs in control and 
Panx¥™'Cd4-cre mice (Extended Data Fig. 10b, c), we isolated CD11b* 
macrophages and CD11c* dendritic cells fromthe thymus and analysed 
changes in gene expression (Extended Data Fig. 10d, e). In wild-type 
mice, dexamethasone-induced apoptosis of thymocytes resulted 
in increased expression of Uap1, Ugdh and Pbx1 in surrounding live 
myeloid cells (linked to anti-inflammatory macrophage skewing or 
glycosylation and transcription of /[10)” (Fig. 4a). This response was 
attenuated in mice lacking PANX1 channels in the dying thymocytes 
(Fig. 4a). Thus, apoptotic PANX1-dependent metabolites can induce 
gene expression changes inthe surrounding tissue myeloid cells in vivo. 

When tested individually, many of the metabolites did not strongly 
induce anti-inflammatory and tissue-repair genes from the RNA-seq 
(not shown). As these metabolites are concurrently released from 
apoptotic cells (Fig. 1), we then tested mixtures of six out of the eight 
PANX1-dependent metabolites (Fig. 2b) in two combinations: (i) sper- 
midine, fructose-1,6-bisphosphate (FBP), dihydroxyacetone phosphate 
(DHAP), UDP-glucose, GMP and inosine-5’-monophosphate (IMP); and 
(ii) spermidine, GMP and IMP (Fig. 4b). All six have been previously 
administered in vivo in mice or rats without toxicity (Supplementary 
Table 4). We excluded AMP and glycerol-3-phosphate, as AMP can 
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Fig. 4 | PANX1-dependent metabolite release during apoptosis modulates 
phagocyte gene expression in vivo and can alleviate inflammation. a, PANX1 
expression in apoptotic thymocytes influences gene expression in myeloid 
cells in vivo. Control mice (Panxl'Cd4-cre”) or mice lacking PANX1in 
thymocytes (Panxl"Cd4-cre”) were injected with dexamethasone (Dex) to 
induce apoptosis in thymocytes (cre’ PBS n=3, cre’ Dexn=6,cre” PBS, Dex 
n=4). After 6h, CD11b* CD11c* phagocytes were purified for qPCR analysis of 
Uap] (*P= 0.032, ****P< 0.0001), Pbx1 (****P= 0.0001, *P=0.0103), and Ugdh 
(****P< 0.0001). Pvalues were determined by one-way ANOVA with Turkey’s 
multiple comparison test. b, PANX1-dependent release of metabolites from 
apoptotic cells was compared across cell types and apoptotic conditions to 
design different metabolite mixtures, Memix-6 (blue) and Memix-3 (purple). 

c, Memix-6 (n=6) and Memix-3 (n= 4) solutions mimic gene expression changes 
in phagocytes induced by apoptotic supernatants. *P< 0.05, **P<0.01, 

****P < 0.0001, unpaired two-tailed Student’s ¢-test.d, Top, schematic of 
arthritis induction and treatments (vehiclen=16, Memix-6 n=11, Memix-3 
n=12mice).1.P., intraperitoneal. Middle, paw swelling was measured using a 
calliper and reported as the percentage change compared to day O. 


be converted to adenosine, a known anti-inflammatory metabolite, 
and it was difficult to determine the optimal in vivo dose for glycerol- 
3-phosphate. The metabolite mixtures were quite potent in inducing 
gene expression in vitro, including genes linked to anti-inflammatory 
macrophage skewing or glycosylation (Uap1 and Ugdh)”’, transcription 
of /110 and inflammation resolution (Pbx1” and Ptgs2”), and metabolic 
processes (Slc14a1 and Sgk1), some of which have also been shown to 
be involved in phagocytosis” (Fig. 4c). For simplicity, we have denoted 
the metabolite mixtures as ‘Memix-6’ and ‘Memix-3’ (Fig. 4b). 

Given the anti-inflammatory gene signature induced by the metabo- 
lites, we next tested whether the Memix-6 and/or Memix-3 metabolites 
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**P= 0.0028, ***P=0.0003. Bottom, scores were assessed onascale of 1to4 per 
paw. ***P=0.0004, ****P=0.0001. Pvalues determined by two-way ANOVA. 

e, Ankle inflammation and bone erosion were scored via haematoxylin and 
eosin (H&E) staining (left) and safranin O staining (right), respectively, from 
arthritic mouse paws on day 8 (‘peak’ disease). Increased magnifications of 
affected areas are shown. Scale bars, 0.4mm (main panel), 0.1mm 
(magnification). f, Clinical analysis of inflammation (left), bone erosion 
(middle) and cartilage erosion (right) was scored by aninvestigator blinded to 
treatments (vehicle n=6, Memix-3 n=7).****P< 0.0001, unpaired two-tailed 
Student’s t-test. g, Memix-3 metabolite solution alleviates inflammationina 
minor antigen-mismatch lung transplant model. Orthotopic left lung 
transplantation from C57BL/10 mice into C57BL/6 recipient mice, with 
Memix-3 administered on post-operation day 1 and 3. Lungs were obtained for 
histological scoring on day 7. h, H&E staining (left) and ISHLT rejection score”® 
(right) in mice as ing (vehiclen=6, Memix-3 n=6).*P=0.024, unpaired two- 
tailed Student’s t-test. Data are mean+s.e.m. (a, c,d) or mean +s.d. (f, hh). Scale 
bars, 100 pm. 


attenuated inflammation in vivo in two contexts: a model of inflamma- 
tory arthritis and a model of lung-transplant rejection. Inthe arthritis 
model, a single injection of serum from arthritic transgenic K/BXN 
mice into C57BL/6J mice results in inflammation of the joints with 
progressive arthritic symptoms, followed by disease resolution”. Of 
relevance to our question, this arthritis model is dependent on myeloid 
cells”, with apoptosis known to occur during disease. We first asked 
whether the full apoptotic secretome could alleviate inflammation in 
this arthritis model, and found that this was the case (Extended Data 
Fig. 10f). Administration of Memix-6 or Memix-3 metabolites after the 
induction of arthritis when the disease symptoms are already noticeable 


resulted in significant attenuation of paw swelling and other arthritic 
parameters, compared with treatment with vehicle controls (Fig. 4d). 
Because FBP alone can have ameliorative roles in arthritis”, we further 
tested Memix-3, which does not contain FBP. Memix-3 metabolites not 
only alleviated paw swelling and external clinical arthritis parameters, 
but also significantly protected the joints from inflammation, bone 
erosion and cartilage erosion (Fig. 4e, f). 

We also tested Memix-3 in a model of lung-transplant rejection, in 
which local innate and adaptive immune responses orchestrated by 
graft-resident antigen-presenting myeloid cells dictate graft accept- 
ance or rejection. We transplanted allografts from the left lung of 
C57BL/10 mice to a minor antigen-mismatched C57BL/6 recipient” 
(Fig. 4g), and treated the graft recipients with Memix-3 or saline vehicle 
control on post-operative days 1 and 3. On day 7 after engraftment, 
the control mice treated with saline showed severe acute rejection 
of allografts”*. Notably, mice treated with Memix-3 had only minimal 
inflammation in the transplanted lungs (Fig. 4h), suggestive of ame- 
lioration of lung rejection. Complementary flow cytometric analysis 
of the lung showed reduced CD4 and CD8 cells in the transplanted 
lungs of mice treated with Memix-3 (data not shown). Thus, a subset 
of apoptotic metabolites can be harnessed for beneficial effects in two 
different inflammatory settings in vivo. 

Collectively, the data presented here advance several concepts. First, 
we identify specific metabolites that are released from apoptotic cells 
(different cell types and modes of apoptosis induction); the speci- 
ficity could arise from metabolic changes in the apoptotic cells (for 
example, sustained production of spermidine), and/or the opening 
of specific channels (such as PANX1). Second, apoptotic cells are not 
inert corpses awaiting removal; instead, via the release of metabolites 
as good-bye signals they actively modulate several gene programs in 
the neighbouring cells within a tissue. Third, the ability of a cocktail of 
apoptotic metabolites to attenuate arthritic symptoms and the rejec- 
tion of lung transplantation provide a proof-of-concept that it is pos- 
sible to harness the beneficial therapeutic properties of apoptosis in 
specific inflammatory conditions. 
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Methods 


Reagents 

Trovafloxacin, spironolactone, dexamethasone, spermidine, FBP, 
DHAP, IMP and GMP were obtained from Sigma. UDP-glucose was 
obtained from Abcam, and annexin V-Pacific Blue was from BioLegend. 
7AAD, TO-PRO-3 anti-CD11b-PE (clone M1/70), anti-CD11c-PE (clone 
N418), and anti-CD16/CD32 (clone 93) were obtained from Invitrogen. 
Antibodies specific for mouse CD95 were obtained from BD. Human 
anti-Fas (clone CH11) was obtained from Millipore. Other reagents 
were obtained as follows: ABT-737 (abcam), TRAIL (Sigma) and zVAD- 
FMK (Enzo). 


Mice 

C57BL/10 and C57BL/6J wild-type mice were acquired from Jackson Lab- 
oratories. Panx# and Panx1’ mice have previously been described”. 
To generate mice with deletion of Panx1inthymocytes, PanxE™ mice 
were crossed to Cd4-cre mice (Taconic). KRN T cell receptor (TCR) 
transgenic mice were a gift from D. Mathis and were bred to non-obese 
diabetic (NOD) mice (Jackson Laboratories) to obtain the K/BxN mice, 
which develop progressive spontaneous arthritis”. Serum was col- 
lected from 9-week-old K/BxN mice by cardiac puncture. Animal pro- 
cedures were approved and performed according to the Institutional 
Animal Care and Use Committee (IACUC) at the University of Virginia. 


Apoptosis induction 

Wild-type Jurkat E6.1 (ATCC) or dominant-negative PANX1-express- 
ing (PANX1-DN)”° cells were resuspended in RPMI-1640 containing 1% 
BSA, 1% penicillin-streptomycin-glutamine (PSQ), and 10 mM HEPES 
and treated with 250 ng mI anti-Fas (clone CH11), 10 1M ABT-737, or 
exposed to 150 mJ cm’ UV-C irradiation for 1-2 min (Stratalinker). Jurkat 
cells were incubated for 4 h after apoptosis induction. For apoptosis 
induction inthe presence of PANX1 inhibitors, Jurkat cells were treated 
with spironolactone (50 pM) or trovafloxacin (25 uM) in RPMI contain- 
ing 1% BSA and 1% PSQ. 

Primary thymocytes isolated from 4-6-week-old wild-type or Panx1” 
mice were treated with 5 pg mI anti-Fas (clone Jo2), that was subse- 
quently crosslinked with 2 pg mI“ protein G. Primary thymocytes were 
incubated for 1.5 h after apoptosis induction. 

BMDMs from B6™?*c1“ C11” mice (C57BL/6] mice that express a 
functional Nirp1b transgene (B6™"”"*")) crossed with mice lacing cas- 
pase-1 (CI, also known as Casp1) and caspase-11 (C11, also known as 
Casp4) were a gift from M. Lamkanfi’s laboratory. BMDMs were gen- 
erated by culturing mouse bone marrow cells in RPMI medium condi- 
tioned with 10% dialysed serum and 1% penicillin-streptomycin. The 
medium was supplemented with 20 ng mI" of purified mouse M-CSF. 
Cells were incubated ina humidified atmosphere containing 5% CO, for 
6 days. BMDMs from wild-type B6 or B6™""“*C1’ C11” mice were seeded 
in 6-well plates and, the next day, either left untreated or stimulated 
with 500 ng ml ‘anthrax protective antigen (500 ng mI, Quadratech) 
and anthrax lethal factor (250 ng mI”, Quadratech). Supernatants from 
either untreated or treated BMDMs were collected. Cellular debris was 
removed via centrifugation, and the clarified supernatant was used for 
metabolic profiling. 

A549 cells were treated with 10 pM ABT-737 or exposed to 600 mJ 
cm UVirradiation, and incubated for 24h. HCT-116 cells were treated 
with 10 pM ABT-737 or 100 ng mI TRAIL and incubated for 24 h. All 
cells were pre-treated for 10 min with 50 uM zVAD before apoptosis 
induction in indicated experiments. All cells were incubated at 37 °C 
with 5% CO, for indicated times. 


Metabolite detection 

Spermidine detection was measured using a colorimetric kit (Cloud- 
Clone) via manufacturer’s protocol. In brief, supernatants taken from 
cells under specified conditions were centrifuged at 1,000g for 20 min. 


All reagents were brought to room temperature before use. Then, 50 pl 
of sample was added to each well followed by equal volume of detection 
reagent A and the plate was mixed. Samples were incubated covered for 
1hat37 °C. Wells were washed with wash solution three times before the 
addition of detection reagent B, after which samples were incubated 
for another 30 min at 37 °C. Samples were washed five more times. 
Substrate solution (90 pl) was then added to each well and incubated 
for 10 min at 37 °C, after which stop solution (50 pl) was added, and 
the plate was mixed and immediately measured at 450 nm ona plate 
reader (Flex Station 3). Analysis was performed by back calculation 
to the standard curve, background subtraction and normalization to 
live cell controls. 

ATP was measure using a luciferase-based kit (Promega) via the manu- 
facturer’s protocol. All reagents were equilibrated to room temperature 
before use. In brief, supernatants taken from cells under specified con- 
ditions were immediately moved to ice, and centrifuged at 500g for 5 
min. Samples were placed back onice and 50 pl of samples and 50 ul of 
luciferase reagent were mixed in a 96-well opaque plate. Luminescence 
was immediately measure on the Flex Station 3. Analysis was performed 
by back calculation to the standard curve, background subtraction and 
normalization to live cell controls. 

Glycerol-3-phosphate and creatine were measured on the basis of 
manufacturers’ protocols (Abcam). In brief, supernatants were taken 
from specified culture conditions and spun at 500g. Then, 50 ul of 
supernatant was added to a 96-well plate. Detection reagents were 
prepared as indicated in the protocol and added to respective wells. 
Samples were incubated for 40 min or 1h for glycerol-3-phosphate or 
creatine, respectively. Absorbance at 450 nm or fluorescence at excita- 
tion/emission 535/587 nm was measured for glycerol-3-phosphate or 
creatine, respectively. 


Flow cytometry of apoptosis and PANX1 activation 

Apoptotic cells were stained with annexin V-Pacific Blue, 7AAD and 
TO-PRO-3 for 15 min at room temperature in annexin V binding buffer 
(140 mM NaCl, 2.5 pM CaCl, 10 mM HEPES) and subjected to flow cytom- 
etry on Attune NxT (Invitrogen). Data were analysed using FlowJo v.10 
software. 


Metabolomics analysis of apoptotic supernatant and cell pellet 
Sample extraction, processing, compound identification, curation 
and metabolomic analyses were carried out at Metabolon and Human 
Metabolome Technologies (HMT)*”. In brief, supernatants were sepa- 
rated from cell pellets via sequential centrifugation and frozen before 
shipment for metabolomic analysis. For HMT, supernatant samples 
were spiked with 10 ul of water with internal standards, then filtered 
through a 5-kDa cut-off filter to remove macromolecules and small 
vesicles. Cationic compounds were diluted and measured using positive 
ion mode electrospray ionization (ESI) via capillary electrophoresis- 
time-of-flight mass spectrometry (CE-TOF/MS). Anionic compounds 
were measures in the positive or negative ion mode ESI using capillary 
electrophoresis—tandem mass spectrometry (CE-MS/MS). Samples 
were diluted to improve the capillary electrophoresis-triple quadru- 
pole mass spectrometry (CE-QqQMS) analysis. Peak identification 
and metabolite quantification were determined using migration time, 
mass-to-charge ratio, and the peak area normalized to the internal 
standard and standard curves. Concentrations reported are ona per 
million cell basis, which was derived by back calculations on the cell 
number that was used in the experimental set-up. 

For untargeted metabolomics analysis by Metabolon, recovery 
standards were added to samples to monitor quality control of the 
analysis. Samples were precipitated in methanol with shaking for 2 min. 
Samples were then placed onthe TurboVap to remove organic solvent 
and the samples were stored overnight under nitrogen gas. Samples 
were analysed under four different conditions; two for analysis by two 
separate reverse phase (RP)/ultra-performance liquid chromatography 


(UPLC)-MS/MS methods with positive ion mode ESI, one for analysis 
by RP/UPLC-MS/MS with negative ion mode ESI, and one for analysis 
by HILIC/UPLC-MS/MS with negative ion mode ESI. Using a library 
based on authenticated standards that contains the retention time/ 
index, mass-to-charge ratio (m/z), and chromatographic data (includ- 
ing MS/MS spectral data) on all molecules in the library (Metabolon), 
the metabolite identification could be performed with reverse scores 
between the experimental data and authenticated standards. Although 
there may be similarities based on one of these factors, the use of all 
three data points can be used to identify biochemicals. 


Metabolite flux experiments with [“C]arginine labelling 

Cells were re-suspended in arginine-free RPMI medium containing 
10% dialysed serum, supplemented with 1 mM "C,-labelled L-arginine 
HCI (Thermo Fischer Scientific). Cells were either exposed to UV or 
left untreated. This step was performed within 1 min of the addition 
of medium containing [“C]arginine to cells. Cells were thenincubated 
at 37 °C. Samples were collected every hour to trace the incorporation 
of the label from arginine into the polyamine pathway for both UV- 
exposed and live cells. Where indicated, cells were pre-treated with 
ZVAD-FMK to inhibit caspases. 

Metabolite extraction from the pellet or supernatant was performed 
by adding 300 ul of 6% trichloroacetic acid (TCA) to a pellet of 4 mil- 
lion cells on ice. The samples were then vortexed thoroughly at 4 °C, 
followed by centrifugation to remove cell debris. Supernatant (100 pl) 
was mixed with Na,CO, (900 ul of 0.1M, pH 9.3), followed by isobutyl 
chloroformate addition (25 pl). The mixture was incubated at 37 °C 
for 30 min and then centrifuged for 10 min at 20,000g. Supernatant 
(800 pl) was transferred to a fresh tube, followed by the addition of 
1,000 pl diethyl ether and vortexing. The mixture was allowed to sit at 
room temperature for 10 min for phase separation after which, 900 ul 
of sample was collected in a fresh Eppendorf tube. The samples were 
dried via Speedvac. For liquid chromatography-mass spectrometry 
(LC-MS) analysis, 150 ul of 1:1 mixture of 0.2% acetic acid in water and 
0.2% of acetic acid in acetonitrile was added to the dried sample. 


RNA-seq analysis 

LR73 cells (ATCC) were plated at 10° per well in 24-well tissue culture 
plates and cultured for 16 h at 37 °C with 5% CO.,. The cells were then 
rinsed with PBS, and fresh supernatants taken from live Jurkat, apop- 
toticJurkat (UV), or PANX1-DN apoptotic Jurkat (UV) cells were added 
for 4 h (as described in ‘Apoptosis induction’). Total RNA was col- 
lected using the Nucleospin RNA kit (Macherey-Nagal) and an mRNA 
library was constructed with Illumina TruSeq platform. Transcriptome 
sequencing using an Illumina NextSeq 500 cartridge was then per- 
formed on samples from four independent experiments. RNA-seq data 
were analysed using Rv1.0.136 and the R package DeSeq? for differential 
gene expression, graphical representation, and statistical analysis. 


Quantitative reverse transcription PCR analysis 

RNA was extracted from cells treated with different live or apoptotic 
supernatants. Where indicated, supernatants were filtered througha 
3-kDa filter as suggested by manufacturer’s protocol. In brief, super- 
natants were separated from cells and large vesicles via sequential cen- 
trifugations. Supernatants were then added to 3-kDa filters (Millipore) 
and centrifuged for 1h at 3,000g before the addition of supernatant 
to live LR73 cells. Nucleospin RNA kit (Macherey-Nagel) was used for 
RNA extraction and cDNA was synthesized using QuantiTect Reverse 
Transcription Kit (Qiagen). Gene expression of indicated genes was 
performed using Taqman probes (Applied Biosystems) and the Ste- 
pOnePlus Real Time PCR System (Applied Biosystems). 


Thymocyte death induction in vivo 
Six- to eight-week-old Panx? or Panx¥Cd4-cre mice were injected 
intraperitoneally with dexamethasone (250 pg). Thymus was obtained 


6 hafter injection and single cell suspensions were prepared using 
70-um strainers (Fisher). An aliquot of digested tissue was taken to 
measure the extent of thymocyte cell death and PANX1activation using 
annexin V-Pacific Blue, 7AAD, and TO-PRO-3, as described in ‘Flow 
cytometry of apoptosis and PANX1 activation’. Samples were acquired 
on Attune NxT (Invitrogen) and analysed using FlowJo v.10 Software. 


Thymic myeloid cell isolation and gene expression 

Six- to eight-week old Panx!™ or Panx"'Cd4-cre mice were injected with 
dexamethasone and single cell suspensions of thymus were prepared as 
described above. After isolation, cells were incubated with anti-CD16/ 
CD32 (Fc-Block, Invitrogen) for 20 min at 4 °C. Cells were then stained 
with anti-CD3-PE and run through a MACS kit using anti-PE microbeads 
to ‘de-bulk’ the cell suspension and remove most thymocytes. Cell flow- 
through (CD3-negative population) was collected and stained with anti- 
CD11b-PE and anti-CD11c-PE antibodies for 30 min at 4 °C. Stained cells 
were purified using the anti-PE MicroBeads MACS kit (Miltenyi Biotec), 
following the manufacturer’s protocol. Sample aliquots were run onthe 
Attune NxT (Invitrogen) and analysed using FlowJo v.10 Software. Total 
RNA from purified cells was isolated Nucleospin RNA kit (Macherey-Nagel) 
for cDNAsynthesis and quantitative reverse transcription PCR (qRT-PCR), 
as described in ‘Quantitative reverse transcription PCR analysis’. 


Memix preparation and in vivo treatment 

The metabolite mixture Memix-6 was composed of the six metabo- 
lites: spermidine, FBP, DHAP, GMP, IMP and UDP-glucose. Memix-3 was 
composed of spermidine, GMP and IMP. Concentrations of metabolites 
used for in vitro LR73 phagocyte treatment were as follows (based on 
targeted metabolomics): IMP (3.3 1M), DHAP (36 uM), FBP (0.5 pM), 
GMP (2.1 1M), UDP-glucose (2 1M) and spermidine (0.3 pM). Concentra- 
tions of metabolites used for in vivo mice treatment were as follows: IMP 
(100 mg kg), DHAP (50 mg kg“), FBP (SOO mg kg“), GMP (lOO mgkg’’), 
UDP-glucose (100 mg kg) and spermidine (100 mg kg”). 


K/BXN induced arthritis 

C57BL/6J mice were given intraperitoneal injections of 150 pl of serum 
from K/BXxN mice on day O and paw swelling was measured at indicated 
time points using a calliper (Fisher). Measurements are presented as the 
percentage change from day 0. On day 1, mice were randomly assigned 
into three groups and given daily intraperitoneal injections of Memix-3, 
Memix-6 or vehicle up to day 5. In separate experiments, mice on day 
1 were randomly assigned and given daily injections of either live or 
apoptotic supernatants up to day 5S. Clinical scores were assigned for 
each pawas follows: 0, no paw swelling or redness observed; 1, redness 
of the pawor a single digit swollen, normal V shape of the hind foot (the 
foot at the base of the toes is wider than the heel and ankle); 2, two or 
more digits swollen or visible swelling of the paw, U shape of the hind 
foot (the ankle and the midfoot are equal in thickness); and 3, reversal 
ofthe V shape of the hind foot into an hourglass shape (the foot is wider 
at the heel than at the base of the toes). A combined clinical score of all 
paws is presented. Paw measurements and clinical score assignments 
were performed by an investigator blinded to the treatment groups. 


Lung transplant rejection model 
Orthotopic left lung transplantation was carried out according to pre- 
vious reports”. To study the alteration of allo-immune response by a 
minor antigen-mismatched combination, C57BL/10 donor and C57BL/6 
recipient mice were used. The recipient mice were administrated with 
Memix-3 or vehicle intraperitoneally on post-operative days 1 and 3. 
On day 7, the recipient mice were euthanized and left lung allografts 
were obtained and processed for histology. 


Histology 
Lungs were fixed in formalin, sectioned and stained with H&E. The 
acute rejections were graded according to the International Society 
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for Heart and Lung Transplantation (ISHLT) A grade criteria by a lung 
pathologist who was blinded to the experimental settings”. For the 
model of arthritis, mice were euthanized at day 8 of K/BxN-serum- 
induced arthritis and the hind paws were fixed in 10% formalin (Fisher). 
Decalcification, sectioning, paraffin embedding, H&E staining and 
safranin O staining was performed by HistoTox Labs. Images of ankle 
sections were taken on an EVOS FL Auto (Fisher) and analysed using 
the accompanying software. Histology scoring was performed by an 
investigator blinded to the mouse treatment. For inflammation and 
cartilage erosion scoring, the following criteria were used: O, none; 1, 
mild; 2, moderate; and 3, severe. For bone erosion scoring, the follow- 
ing criteria were used: 0, no bone erosions observed; 1, mild cortical 
bone erosion; 2, severe cortical bone erosion without the loss of bone 
integrity; and 3, severe cortical bone erosion with the loss of cortical 
bone integrity and trabecular bone erosion. 


Statistical analysis 

Statistical significance was determined using GraphPad Prism 7, 
using unpaired Student’s two-tailed t-test (paired and unpaired), 
one-way ANOVA, or two-way ANOVA according to test requirements. 
Grubbs’ outlier test was used to determine outliers, which were 
excluded from final analysis. *P < 0.05, **P < 0.01, ***P < 0.001. No 
statistical methods were used to predetermine sample size. Unless 
otherwise stated, experiments were not randomized and investiga- 
tors were not blinded to allocation during experiments and outcome 
assessment. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


RNA-seq data have been submitted to the Gene Expression Omnibus 
(GEO) under accession number GSE131906. Source Data for Figs. 1-4 
and Extended Data Figs. 1-10 are provided with the paper. Other data 


that support the findings of this study are available from the corre- 
sponding author upon request. 
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Extended Data Fig. 1| Metabolite release from apoptotic Jurkat cells. a, 
Jurkat cells were induced to undergo apoptosis by UV irradiation. Staining with 
7AAD and annexin V (AV) was used to determine the percentage of live 
(AV7AAD ), apoptotic (AV*7AAD ) or necrotic (AV*7AAD‘) cells after 4h. b, 
Quantitative analysis of apoptosis (top) and secondary necrosis (bottom) 
(n=4). Dataare mean+s.d.****P< 0.0001, unpaired two-tailed Student’s t-test. 
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c,d, Volcano plot (c) and heat map (d) from untargeted metabolomics of 
supernatants fromJurkat T cells, representing statistically enriched or reduced 
(P<0.05, two-sided Welch’s two-sample t-test) metabolites in the apoptotic 
supernatants relative to live supernatants. Data are representative of four 
biological replicates. 
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Extended Data Fig. 2| Reciprocal metabolite changes between apoptotic 
supernatant and pellet. a, Heat map produced from untargeted 
metabolomics of Jurkat T cell pellets, representing statistically enriched or 
reduced (P< 0.05, two-sided Welch’s two-sample t-test) metabolites inthe 
apoptotic pellet relative to the live cell pellet (n= 4 biologically independent 
samples). b, Bi-directional plot representing the 85 metabolites that were 
statistically enriched in the apoptotic supernatant and simultaneously 


reduced in the apoptotic cell pellet relative to live cell conditions (P< 0.05, two- 
sided Welch’s two-sample t-test). Metabolites were grouped by metabolic 
pathways (n=4 biologically independent samples). c-f, Mass spectrometry was 
used to determine the relative amount of spermidine (c), inosine (d), UDP- 
glucose (e) and AMP (f) insupernatants and cell pellets fromJurkat T cells in live 
and apoptotic conditions (n =4 biologically independent samples). *P=0.014, 
****P < 0.0001, unpaired two-tailed Student’s t-test. Dataare mean +s.d. 
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Extended Data Fig. 3| Conserved metabolite release during apoptosis. a, 
Mass spectrometry was used to measure the concentration of the five 
metabolites that were released across all conditions and platforms tested, in 
live or apoptotic supernatants per million Jurkat T cells or isolated primary 


Student’s t-test. b, The concentration of ATP released in the supernatant across 
the different apoptotic Jurkat cells was determined by luciferase assay (n= 4). 
Data are mean +s.e.m. ****P< 0.0001, ordinary one-way ANOVA with Turkey’s 
multiple comparison test. c, Table outlining the different cell types, apoptotic 


thymocytes (back-calculated from total cells used in experimental set-up) 
(n=3). Metabolites are grouped by metabolic pathways. Dataare mean t+s.d. 
*P=0.014, **P=0.0014, ***P= 0.0002, ****P< 0.0001, unpaired two-tailed 


stimuli, techniques and metabolites screened for untargeted (more than3,000 
features or compounds) and targeted (116 metabolites) metabolomics, 
including ATP, spermidine, glycerol-3-phosphate and creatine. 
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Extended Data Fig. 4 | PANX1 activation and inhibition during cell death. a, 
Top, representative histograms of TO-PRO-3 dye uptake in thymocytes across 
the different conditions. Bottom, PANX1activation in live and apoptotic 
thymocytes from wild-type (Panx1”) and PANX1-knockout (PanxI”) mice as 
assessed via flow cytometry by measuring the mean fluorescent intensity of 
TO-PRO-3 dye uptake (n = 3 biological replicates). Data are mean+s.e.m. 
****P < 0.0001, ordinary one-way ANOVA with Turkey’s multiple comparison 
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test. b, Top, representative histograms of TO-PRO-3 dye uptake inJurkat cells, 
across the different conditions described. Bottom, PANX1activation as 
assessed by flowcytometry of the uptake of TO-PRO-3 dye in apoptotic wild- 
typeJurkat cells, Jurkat cells expressing mutant PANX1-DN, and Jurkat cells 
treated with PANX1 inhibitor spironolactone (50 pM) or trovafloxacin (25 uM) 
(n=4 biological replicates). Data are mean +s.e.m. ****P< 0.0001, ordinary one- 
way ANOVA with Turkey’s multiple comparison test. 
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death. a, Control (Panx1") or Panx1’ thymocytes were treated with anti-Fas comparison test. c,d, Quantification of apoptosis (c) and secondary necrosis 
antibody (5 pg mI’) for1.5h. Cells were stained with 7ZAAD and annexin Vto (d) from Jurkat cells before metabolomics analysis (n= 4). Data are 
determine the percentage of live, apoptotic or necrotic cells, as in Extended mean +s.e.m. ****P< 0.0001, ordinary one-way ANOVA with Turkey’s multiple 
Data Fig. 1a. b, Quantification of apoptosis (top) and secondary necrosis comparison test. e, Cells were stained with 7AAD and annexin V to determine 


(bottom) of control and PANX1-knockout thymocytes (n= 3). Data are the percentage of live, apoptotic or necrotic cells. 
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Extended Data Fig. 6 | See next page for caption. 


Extended Data Fig. 6 | PANX1-dependent metabolite release during 
apoptosis. a, Mass spectrometry was used to determine the relative amounts 
of AMP, GMP, UDP-glucose and FBP in supernatants fromJurkat T cells across 


different conditions (n= 4). Data are mean +s.e.m.****P< 0.0001, ordinary one- 


way ANOVA with Turkey’s multiple comparison test. b, Jurkat cells were 
induced to undergo apoptosis by treatment with anti-Fas antibody 

(250 ng ml). Mass spectrometry was used to measure the absolute 
concentration per million cells of AMP (top), UDP-glucose (middle) and FBP 
(F-1,6-BP) (bottom) in the supernatants of Jurkat T cells across different 
conditions (back-calculated from total cells used in experimental set-up) 
(n=3). Dataare mean +s.e.m. *P=0.031, **P= 0.0013, ****P< 0.0001, ordinary 


one-way ANOVA with Turkey’s multiple comparison test. c, Mass spectrometry 
was used to determine the concentrations of AMP, GMP, UDP-glucose and FBP 
per million cells (back-calculated from total cells used in experimental set-up) 
inthe supernatant from isolated primary thymocytes across different 
conditions (n=3). Data are mean +s.e.m. ****P< 0.0001, ordinary one-way 
ANOVA with Turkey’s multiple comparison test. d, e, Relative concentrations of 
inosine (d) and choline (e) inlive, apoptotic or apoptotic supernatants in which 
PANX1 was inhibited were determined by mass spectrometry (n=4). Dataare 
mean +s.e.m. ****P< 0.0001, ordinary one-way ANOVA with Turkey’s multiple 
comparison test. 
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Extended Data Fig. 7| Conserved PANX1secretome. a, Top, three-way Venn peak intensity (untargeted metabolomics) or absolute concentrations 
diagram comparing PANX1-dependent metabolites released from apoptotic (targeted metabolomics) in the supernatant of the indicated cell treatments 
cells across different conditions tested. Bottom, table showing the relative and knockout mice. N.D., not determined. 
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Extended Data Fig. 8 | Transcriptional and metabolic changes during 
apoptosis. a, Re-analyses of RNA-seq data from apoptotic cells" 
demonstrating that the SRM mRNA levels are increased or retained during 
apoptosis. b, After induction of apoptosis (n=4), SRMmRNA expression was 
assessed over time relative to live controls (n=5). Dataare mean+s.e.m. 


Relative Spermidine 


Jurkat — UV-induced apoptosis 


¢ . 
Ss 10 -@ Live 
® -& Apoptotic 
a 

x 

ite) 

Ss 5 

a 

— * 

ow * 

% 

= O 

=e 0 1 2 3 4 

a Hours 


**P= (0.007, two-way ANOVA.¢, Incorporation of “C-labelled arginine into the 
polyamine pathway intermediate spermidine and release from Jurkat cells after 
apoptosis, and its partial reduction by the pan-caspase inhibitor zVAD (n= 3). 
Data are mean¢+s.d. **P=0.0088, unpaired two-tailed Student's t-test. 
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Extended Data Fig. 9 | Transcriptional changes on surrounding phagocytes 
induced by PANX1-dependent metabolite release during apoptosis. 

a, Principal component (PC) analysis on the RNA-seq dataas a quality control 
statistic (n = 4 biological replicates). b, Experimental procedure is described 

in Fig. 3d. qPCR was used to assess gene expression changes in Ptgs2 (top), 

Sgk1 (middle) and Slc14a1 (bottom) in phagocytes after treatment with 
supernatants from Jurkat cells or Jurkat cells expressing DN-PANX1(n=7). Data 
are mean +s.e.m. Live-AC **P= 0.0074 (live-AC Sgk1), **P=0.0031(AC-AC 
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Sgk1),****P< 0.0001, ordinary one-way ANOVA with Turkey’s multiple 
comparison test. c, Experimental procedure is as described in Fig. 3d, except 
before treatment of LR73 cells with supernatant, the supernatant was filtered 
through a3-kDa filter to remove large molecules. qPCR was used to assess gene 
expression changes in Sgk1 (top) and Slc14a1 (bottom) in phagocytes after 
treatment with supernatants under specified conditions (n=3).Dataare 

mean +s.e.m. ***P= 0.0001, ****P< 0.0001, ordinary one-way ANOVA with 
Turkey’s multiple comparison test. 
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Extended Data Fig. 10 | See next page for caption. 


Article 


Extended Data Fig. 10 | Analysis of thymic cell death in vivo and effects of 
supernatants during arthritis. a, Analysis of thymic populations used for 
experimental data in Fig. 4a. After thymus isolation, the CD11b-CD11c~ 
population that contained thymocytes was used for mRNA isolation to test the 
efficiency of deletion of the Panx1 allele. qPCR analysis of Panx] mRNA in 
control mice (Panxl“"Cd4-cre’) (n= 6) or mice in which PANX1has been 
knocked out inthymocytes (PanxICd4-cre”) (n=7). CD11b*CD11c* myeloid 
cells obtained from the thymus of PanxI'Cd4-cre” mice were analysed for 
Panx1 expression to demonstrate that PANX1 was not deleted. PANX1 deletion 
was deleted only from thymocytes and not the myeloid cells that do not express 
CD4. Data are mean +s.d. **P= 0.0015, unpaired two-tailed Student’s t-test. b, 
Representative flow cytometric plots showing the extent of apoptosis induced 
by dexamethasone in control and Panxl" CD4-Cre* mice. After thymus 
isolation, cells were stained with 7ZAAD and annexin V to determine the 
percentage of live, apoptotic or necrotic cells, as in Extended Data Fig. la.c, 
Quantitative analysis of apoptosis (left) and secondary necrosis (right) of 


CD11b CD11c" thymic populations from Panx?" CD4-Cre (PBS n=4, Dex 
n=10) or Panxl“' CD4-Cre* (PBS n=3, Dexn=9) mice treated with PBS or 
dexamethasone. Data are mean +s.e.m. ****P< 0.0001, ordinary one-way 
ANOVA with Turkey’s multiple comparison test. d, Representative flow 
cytometry plots demonstrating the purity of CD11b*CD11c* population after 
magnetic separation from the different mice and treatment conditions. e, 
Comparison of the CD11b*CD11c’* cells isolated under different conditions 
(cre’: PBSn=4, Dexn=7; cre’: PBS n=3, Dexn=6).Dataaremean+s.e.m. 
P>0.05(n.s.), ordinary one-way ANOVA with Turkey’s multiple comparison 
test. f, Apoptotic supernatants alleviate arthritic disease induced by serum 
from KBx/N mice. C57BL/6J mice were injected with serum from K/BxN mice to 
induce arthritis. Live (n= 4) or apoptotic (n=5) supernatant was given for five 
days after arthritis induction. Paw swelling was measured using acalliper and 
reported as the percentage change compared with day O. Data are 

mean +s.e.m. *P=0.0131, two-way ANOVA. 
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Data collection Gene functions were ascribed using Uniprot and article search engines to generate a composite lists. Pathway analyses were performed 
using the MSigDB resource by MIT-Broad Institute.StepOne Software v2.3, BD FACSDiva V8.0, Attune NxT, StepOnePlus v2.3 andNeqtSeq 
System Suite for the Illumina NextSeq v500, LC Q Exactive Focus (Thermo Scientific). 


Data analysis GraphPad Prism v.6 and v.7, SPSS v.22, R v3.3.2 (Bioconductor package DESeq2) , FlowJo v.8 and v.10 Mac, Xcalibur version 4.2.28.14 
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- Accession codes, unique identifiers, or web links for publicly available datasets 
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- Adescription of any restrictions on data availability 


Data Availability 
RNA sequencing data presented in this study are in the NCBI GEO repository under the accession GSE131906. 
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Data exclusions _ Statistical tests for outliers are routinely performed using Grubbs’ test for outliers. No data was excluded in this manuscript. 


Replication Consistent results obtained from more than two technical replicates per experiment. A significant number of the experiments used at least 
3-4 biological replicates. 


Randomization — Allocation of mice was random in all in vivo experiments, including mice from different vivariums. 


Blinding In vivo experiments for disease models were all blinded. Researcher conducting experiments, data acquisition, data analysis, or histological 
scoring were blinded to treatment groups. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 

n/a | Involved in the study n/a | Involved in the study 

L_| Antibodies |_| ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 
Antibodies 

Antibodies used Annexin V-Pacific Blue was from BioLegend (Cat#640919, Lot#B262423). anti-CD11b-PE (clone M1/70)(Cat#12-0122-81 
Lot#4278772), anti-CD11c-PE (clone N418)(Cat#12-0114-82), and anti-CD16/CD32(clone 93)(Cat#16-0161-85, Lot#4316711) 
were obtained from Invitrogen. Antibodies specific for Siglec-F-PE (clone E50-2440)(Cat#552126, Lot#7058859) and mouse CD95 
(Cat#554254, Lot#35882) were obtained from BD. Human anti-Fas (clone CH11)(Cat#05-201, Lot#2782852) was obtained from 
Millipore. 

Validation All antibody lots are routinely tested by the manufacturers. 


Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) Human Jurkat Cell £6.1, HCT-116, and A549 were obtained from ATCC. 
Authentication Morphological shape of cell lines was monitored via microscopic examination. 
Mycoplasma contamination All cell lines used in the laboratory are regularly tested for mycoplasma contamination and tested negative. 


Additionally, all medias and serum lots used are regularly tested and tested negative. 


Commonly misidentified lines No commonly misidentified cell lines were used. 
(See ICLAC register) 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals C57BL/10 and C57BL/6J wild-type mice were acquired from Jackson Laboratories. To generate mice with deletion of Panx1 in 
thymocytes, Panx1fl/fl mice were crossed to Cd4-Cre mice (Taconic). KRN TCR transgenic mice were a gift from Dr. Diane Mathis 
at the Harvard Medical School, and were bred to NOD mice (Jackson Laboratories) to obtain the K/BXN mice. B6NIrp1b+C1—/ 
-C11-/- were a gift from Dr. Mohamed Lamkanfi’s lab (VIB/UGent, Belgium). All mice used in this study were 6-12 week old. 


ales were used in arthritis studies and females were used for naphthalene lung model. 
Wild animals No wild animals. 
Field-collected samples No field collected samples. 
Ethics oversight Animal procedures were approved and performed according to the Institutional Animal Care and Use Committee (IACUC) at the 
University of Virginia. 
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Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Flow Cytometry 
Plots 


Confirm that: 


The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 


The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a ‘group’ is an analysis of identical markers). 


All plots are contour plots with outliers or pseudocolor plots. 


A numerical value for number of cells or percentage (with statistics) is provided. 


Methodology 
Sample preparation Thymocytes and myloid cells in thymus were obtained by gentle mechanical disruption. Samples were filtered prior to staining 
and kept on ice during staining. All fluorescent antibodies were aliquotted in a sterile hood with minimal light exposure. Staining 
of samples were protected from light throughout. 
Instrument Data were collected on Attune NxT (Invitrogen). 
Software Data were analyzed using FlowJo V10 Software. 


Cell population abundance Purity of isolated samples was obtained by antibody stain and FACS. Sample purity was greater than 90%. 


Gating strategy Standard lymphocyte gates were applied, following by doublet exclusion using FSCHxW 
and SSC-HxW. Myeloid cells in thymus were gated using CD11b and CD11c. Apoptosis of cells were gated using Annexin V and 
7AAD. Pannexin-1 activation was measured using TO-PRO-3 dye. Representative Flow plots are shown are in Figures and 
Extended Data. 


Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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Cancer genomics studies have identified thousands of putative cancer driver genes’. 
Development of high-throughput and accurate models to define the functions of 
these genes is a major challenge. Here we devised a scalable cancer-spheroid model 
and performed genome-wide CRISPR screens in 2D monolayers and 3D lung-cancer 
spheroids. CRISPR phenotypes in 3D more accurately recapitulated those of in vivo 
tumours, and genes with differential sensitivities between 2D and 3D conditions were 
highly enriched for genes that are mutated in lung cancers. These analyses also 
revealed drivers that are essential for cancer growth in 3D and in vivo, but notin 2D. 
Notably, we found that carboxypeptidase D is responsible for removal of a C-terminal 
RKRR motif? from the a-chain of the insulin-like growth factor 1 receptor that is critical 
for receptor activity. Carboxypeptidase D expression correlates with patient 
outcomes in patients with lung cancer, and loss of carboxypeptidase D reduced 
tumour growth. Our results reveal key differences between 2D and 3D cancer models, 
and establish a generalizable strategy for performing CRISPR screens in spheroids to 
reveal cancer vulnerabilities. 


Despite the large increase in the catalogue of mutations observed 
across diverse cancer types (the ‘long tail’)', it is frequently unclear 
which mutations are functional cancer drivers. Therefore, itis acentral 
challenge to scalably investigate these genes in relevant cancer models 
to assign causality and identify cancer-specific vulnerabilities. 

Existing in vitro and in vivo models are useful for defining the bio- 
logical properties of cancer*’, but each has limitations. Genetically 
engineered mouse models recapitulate tumour growth and microen- 
vironment, but are limited by scalability, time and cost®. Xenograft- 
based models are limited in scale, and can be difficult to manipulate 
in vitro. Genome-scale investigation of cancer growth and drug sensi- 
tivity has largely relied on in vitro 2D cell culture’ ”, which lacks many 
features of disease, such as hypoxia”, altered cell-cell contacts” and 
rewired metabolism”. In vitro organoid models alleviate some of these 
concerns*”*, but are much less scalable. 

CRISPR has enabled substantially improved genetic screening inin vitro 
andinvivo cancer models”””"”-”, Efforts such as DepMap have character- 
ized cancer dependencies using genome-scale CRISPR screens in hun- 
dreds of cell lines, revealing previously undiscovered cancer drivers”? ”. 
Nonetheless, it has been difficult to evaluate how differences in culture 
systems affect the ability to accurately uncover cancer drivers in vivo. 

Here we devised a scalable method to propagate lung adenocarci- 
noma spheroids, and performed genome-wide CRISPR screens in both 


2D monolayers and 3D spheroids. Growth phenotypes in 3D more accu- 
rately resembled those observed in tumours. Furthermore, genes with 
differentially stronger effects in 3D were enriched for those significantly 
mutated in human lung cancers. Among these genes, we identified 
carboxypeptidase D (CPD), a poorly characterized carboxypeptidase, 
as animportant enzyme for maturation of insulin-like growth factor 
1receptor (IGFIR). Together, these results demonstrate a strategy for 
genome-:scale CRISPR screens in 3D spheroids to identify actionable 
cancer vulnerabilities. 


Scalable 3D spheroid system for CRISPR screens 


Although CRISPR screens performed in 2D monolayers have produced 
a wealth of information? ””?, they often fail to replicate key aspects 
of tumour biology”. This is illustrated by phenotypes measured 
across more than 500 screens from the DepMap project. Although 
this resource has uncovered many valuable biological findings’”° ~, 
less than 1% of the top 1,000 hits show a positive growth effect (Fig. 1a). 
Indeed, inactivation of knowntumour-suppressor genes often results 
in negative phenotypes in 2D culture (Fig. 1b). 

We sought to develop ascalable 3D spheroid system to enable high- 
throughput screens that more closely approximate in vivo cancers. 
We optimized seeding density and methylcellulose concentrations 
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Fig. 1| Genome-wide screens performed on3D cultures improve detection of 
cancer genes and pathways compared with those performed in 2D. 

a, Percentage of positive hits among the top 1,000 hits in the DepMap datase 
Each point represents one cell line. b, Median CERES” effects of oncogenes and 
tumour-suppressor genes (TSGs) (annotated in COSMIC*°) among top 1,000 
hits of 517 DepMap cell lines; each data point represents one cell line. 

c, Schematic for CRISPR screens in H23 cells. d, Distributions of phenotypes. 
Values on the x axis show the effect size of each gene and the yaxis shows 
absolute phenotype score (T-score) (see Methods). Dot size represents 


12° 


(Extended Data Fig. 1a, Supplementary Video 1; see Methods) to enable 
propagation of approximately 200 million cells in 3D spheroids in low- 
attachment plates. This enabled us to perform genome-wide CRISPR 
screens in H23 lung adenocarcinoma cells grown in either 2D monolay- 
ers or 3D spheroids (Fig. 1c) using our custom single guide RNA (sgRNA) 
library”. Since H23 cells containa KRAS“” mutation, we also screened 
with ARS-853”° *8, a cysteine-reactive KRAS inhibitor that frequently 
has stronger effects in 3D”. 


3D phenotypes reflect cancer dependencies 


Reproducibility and quality of 3D screening data were equivalent to those 
of the 2D data (Extended Data Fig. 1b-d, Supplementary Table 1), and it was 
immediately apparent that CRISPR screens in3D uncovered many more 
positive growth phenotypes, whereas most hits from the 2D screens had 
negative phenotypes (Fig. 1d). This became more apparent when we exam- 
ined genes with differential effects in 3D by normalizing 3D phenotypes 
against the corresponding 2D phenotypes (3D/2D) (see Methods). We 
next analysed phenotypes for oncogenes and tumour suppressor genes 
(TSGs) annotated in the COSMIC database” within the top 1,000 hits in 
2D or 3D conditions. Both groups were similar in 2D, showing negative 


absolute T-score. e, Phenotype scores for oncogenes and TSGs in top 1,000 hits 
in each condition. Pvalues calculated using two-sided ¢-test. f, Enriched 
pathways among the top 1,000 hits from each condition analysed using 
PANTHER overrepresentation test (see Methods). Significance of enriched 
pathways were measured with Fisher’s exact test and the Benjamini-Hochberg 
false-discovery rates (FDR) were subsequently computed (x axis). The number 
of genes in enriched pathways is shown on the right. In all box plots, box limits 
mark upper and lower quartiles, whiskers represent 1.5x the interquartile range 
and points show outliers. 


median-growth phenotypes (Fig. le). In 3D spheroids, however, onco- 
genes and TSGs exhibited markedly different behaviours, with knockout 
of TSGs showing more positive-growth phenotypes; this was clearer when 
the 3D/2D phenotype was considered (Fig. le, Extended Data Fig. le). 

Pathway-enrichment analysis revealed that a distinct set of cancer- 
specific pathways—including p53 and Ras pathways (known drivers 
in H23 cells)—was enriched among hits in 3D and 3D/2D phenotypes, 
whereas 2D hits were generally related to common essential cellular 
functions suchas DNA replication (Fig. If). Together, these data suggest 
that screens in 3D more accurately capture features of cancer genes 
and pathways (Extended Data Fig. If). 


3D hits are frequently mutated in cancer 


We further investigated the phenotypes for genes frequently mutated in 
lung adenocarcinoma and squamous cell carcinoma” (hereafter, ‘pan- 
lung’). When genes were sorted by the absolute value of their pheno- 
typic strength, inactivation of the ten most-frequently mutated genes 
in the Pan-lung cancer cohort™ showed weaker and more widely dis- 
tributed effects in 2D (Extended Data Fig. 1g, Supplementary Table 2). 
By contrast, many of these frequently mutated genes showed stronger 
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Fig. 2| Genes with differential 3D/2D phenotypes are enriched for 
significantly mutated lung cancer genes. a, Cumulative sum of the 
significance of 11,249 pan-lung cancer genes from1,144 patients with lung 
cancer*!*! measured by MutSig2CV*. The x-axis shows phenotypes sorted by 
strength in 2D, 3D and 3D/2D. The top 3,000 genes are shown. b, Schematic for 
batch-retest CRISPR screens. s.c., subcutaneous injection. c, Comparisons 
between in vitro and in vivo phenotypes in the H23 batch-retest screens. Data 


phenotypes in 3D spheroids. Notably, the 3D/2D phenotypes showed a 
further improved ability to detect strong phenotypes for genes that are 
frequently mutated in lung cancer. This is consistent with the pathway- 
enrichment analysis described above, and suggests that analysis of 
genes with differentially strong effects in 3D may increase the power 
to identify cancer drivers. 

To systematically confirm this, we compared absolute CRISPR pheno- 
types (sorted by phenotypic strength) with the cumulative sum of signifi- 
cance of pan-lung cancer mutations” (Fig. 2a, Supplementary Table 3). 
Again, genes with stronger phenotypes in 3D and, toa greater extent, 
those with stronger phenotypes in 3D/2D, were enriched for significant 
lung-cancer mutations. We reasoned that two factors probably contribute 
tothis enrichment. First, normalizing 3D with 2D phenotypes may unmask 
cancer-specific genes by minimizing the otherwise dominating effects 
of core essential genes (for example, ribosomal genes) that are critical 
for both 2D and 3D growth (Extended Data Fig. 1g). Second, as previously 
suggested”, 3D spheroids are more likely to mimic in vivo tumours. 

Additional genome-wide screens in H1975 and H2009 lung cancer 
lines confirmed key advantages of 3D spheroids, including improved 
detection of cancer pathways and identification of the known driv- 
ers for each of these lines (EGFR-PI3K and p53-KRAS, respectively, 
Extended Data Figs. 2, 3, Supplementary Discussion). 


3D spheroids better match tumour xenografts 

To systematically compare CRISPR screens in 2D monolayers, 3D sphe- 
roids and tumour xenografts, we generated a small batch-retest sgRNA 
library targeting 911 top hits with differential 3D growth effects in our 
genome-wide screens (Fig. 2b, Supplementary Table 4). We transduced 
this library into H23 cells and compared growth in subcutaneous xeno- 
graft tumours with growth in 2D and 3D cultures. We optimized a pro- 
tocol (see Methods) for in vivo CRISPR screening, and obtained highly 
reproducible data from tumour xenografts (Extended Data Fig. 4a, b, 
Supplementary Table 5). Notably, phenotypes of genes from 3D screens 
were much more closely correlated with those in mouse xenograft than 
those from 2D screens (Fig. 2c, Supplementary Discussion). 
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are fit by linear regression (blue line); shaded bands indicate 95% confidence 
intervals. Pearson corr., Pearson correlation coefficient. d, Significance of 744 
pan-lung cancer genes measured by MutSig2CV”, displayed as cumulative sum 
plots against genes sorted by absolute values of 3D/2D phenotypes in H23 cells, 
average (avg.) 3D/2D phenotypes across 10 lung cancer lines, and H23 

in vivo/2D phenotypes in batch-retest screens. 


Tosearch for common 3D-selective vulnerabilities in lung adenocarci- 
noma, we used the same batch-retest library to perform 2D and 3D screens 
across multiple cancer lines. We again observed marked differences 
between 2D and 3D screens inall lines (Supplementary Table 5). Averag- 
ing 3D/2D phenotypes across ten cell lines further increased detection 
of significant mutations observed in patients with lung cancer compared 
with phenotypes from the H23 cell line alone (Fig. 2d). Of note, compari- 
sonofin vivo phenotypes with 2D phenotypes (in vivo/2D) in H23 cells also 
increased detection of significant mutations compared with the in vitro 
3D/2D phenotypes. Notably, top sensitizing hits from the averaged 3D/2D 
phenotypes include several known regulators of RAS-MAPK pathways” 
such as GRB2, SHOC2, PTPN11 (also known as SHP2), GAB1 and MAPK1. 


CPD module shows selective 3D growth effects 


Given that genes with strong 3D/2D phenotypes are enriched for lung 
cancer mutations, we reasoned that these might include novel thera- 
peutic targets. To identify such targets, we defined functional gene 
modules onthe basis of their correlated phenotypes in DepMap” and 
examined their phenotypes. Simultaneous depletion of multiple genes 
from the same functional group should help define vulnerabilities 
within pathways/complexes; indeed, we identified a number of dif- 
ferentially enriched modules from expected genes, including KRAS, 
mTOR and Hippo pathways (Supplementary Discussion). 

Notably, amodule composed of genes correlated with CPD was the 
most strongly depleted in the 3D/2D phenotype (Fig. 3a, Extended Data 
Fig. 4c) and showed strong synthetic lethality with the KRAS(G12C) 
inhibitor specifically in 3D. This suggested that CPD and its functional 
interactors could be promising therapeutic targets. CPD is a poorly 
characterized member of the metallocarboxypeptidase family that 
cleaves C-terminal arginines and lysines from polypeptides”; it is local- 
ized in the trans-Golgi network®. CPD is correlated with FURIN, ATP2C1, 
IGF1R, METand GAB1 ina functional module (Fig. 3b, c, Extended Data 
Fig. 4d-f), but not with a control olfactory receptor gene. Given that 
FURIN and ATP2C1are critical for processing of IGF1R and MET in the 


trans-Golgi** *8, we hypothesized that CPD might have a related role. 
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Fig. 3 |The CPD moduleis critical for 3D spheroid growth and IGFIR 
function. a, Top negative modules (blue). Top positive modules (yellow). They 
axis shows significance of enrichment for co-essential modules (Pvalues, two- 
sided Mann-Whitney U-test); the x axis shows average gene effects of co- 
essential modules (see Methods). KRASi, KRAS inhibitor. b, Proteins encoded 
by genes inthe CPD co-essential module and their localization along the TGN- 
plasma membrane. c, Cluster map showing batch-corrected CERES gene 
effects for the CPD module. d, CPD module and selected top 3D/2D hits were 
validated with individual sgRNAs in competitive growth assays. Left 


To interrogate interactions within the CPD module in H23 cells, we 
measured all pairwise genetic interactions of 145 selected genes with 
strong 3D/2D phenotypes using CRISPR double-knockout screening” 
(Extended Data Fig. 5, Supplementary Tables 6, 7). Similar to their 
behaviour in DepMap, genetic interaction patterns of FURIN and /GFIR 
showed strong correlation with those of CPD. 

Given the strong 3D/2D phenotypes of genes within the CPD module, 
we validated individual genes within the module and other strong hits 
using competitive growth assays and small-molecule inhibitors (Fig. 3d, 
Extended Data Fig. 6). We also observed that inducible knockdown 
of CPD in vitro in established H23 3D spheroids using tetracycline- 
inducible dCas9-KRAB” markedly reduced growth of spheroids 
(Extended Data Fig. 7), suggesting that targeting CPD can have an effect 
on further growth of established spheroids. 


IGFIR signalling is inhibited by CPD deletion 


Since our data suggested CPD functionally interacts with /GFIR, we 
examined how CPD deletion affected IGFIR signalling pathways. We 
first measured protein levels of IGF1IR and phosphorylation of its down- 
stream effectors AKT and ERK1/2 following treatment with IGF1 (Fig. 3e, 
f) in H23 cells grown in 2D. We observed significant reduction of IGFIR 
protein levels and AKT phosphorylation in CPD-deficient H23 cells com- 
pared with control cells. By contrast, phospho-ERK1/2 levels were high 
and unchanged, probably owing to constitutively active KRAS in H23 
cells. Levels of IGF1IR were also significantly reduced in CPD-deficient 
H23 spheroids (Fig. 3g, h). In addition, CPD deletion reduced levels of 
IGF1R and phospho-AKT upon IGF1 addition in H322, A549 and H358 cells 
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micrographs show example images of growth assays for control sgRNAin2D 
and 3D. Data are mean +s.e.m.,n=3; Pvalues calculated by two-sided t-test 
between the control and gene-targeting sgRNAs. Control (safe) sgRNAs are 
described inthe Methods. e, Control, CPD-knockout (KO) and /GFIR-knockout 
H23 cells grown in 2D were stimulated with IGF1 (100 ng mI”) for 15 min and 
levels of IGF1R and activities of downstream effectors were measured by 
immunofluorescence. Scale bars, 20 pm. f, Quantification of 
immunofluorescence ine. *P=6.4 x 10+, **P=1.24 x10>(mean+s.e.m.,n=4; 
two-sided t-test). 


(Extended Data Fig. 8). Lastly, we found that the effect of CPD deletion 
can be rescued by treating H23 cells with excess IGF1, but not by treat- 
ment with epidermal growth factor (EGF) or hepatocyte growth factor 
(HGF) (Extended Data Fig. 9a, b), suggesting that much of the 3D-selective 
CPD-knockout phenotype can be attributed to its regulation of /GFIR. 


CPD removes the IGF1Ra C-terminal RKRR motif 


Since CPD is a carboxypeptidase, we considered whether IGFIR might 
be asubstrate. IGF1R is translated as a single polypeptide (pro-IGF1R), 
which is cleaved by FURIN into a- and B-chains? (Fig. 4a). pro-IGFIR 
does not end in lysine or arginine, and thus should not be a substrate 
for CPD; however, FURIN cleaves pro-IGF1R immediately after a central 
RKRR motif?“°, leaving these four positively charged amino acids at the 
C terminus of the a-chain, creating a potential CPD substrate. 

To test whether the RKRR motif is removed by CPD, we developed an 
assay to measure appearance of the 1D4 epitope”. Using the RholD4 
antibody, which requires a free carboxylate group for binding, we could 
detect the presence of the 1D4 epitope specifically at the C terminus of 
aprotein. We thus created anIGF1R reporter with a1D4 epitope inserted 
immediately upstream of the RKRR motif (Fig. 4b). A Flag epitope on 
the B-chain was used to measure total protein levels. 

When wetransduced control H23 cells with the reporter, we observed 
strong 1D4 and Flag signals, suggesting that RKRR is removed and the 
1D4 epitope is exposed at the C terminus of a-chain (Fig. 4c, e). Dele- 
tion of CPD markedly reduced 1D4 staining, whereas total Flag-IGF1R 
remained unchanged, suggesting that CPD removes the RKRR motif. 
Consistent with these results, a FURIN inhibitor reduced the 1D4 signal 
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Fig. 4|CPDis a carboxypeptidase for the IGF1R a-chain and loss of CPD 
inhibits in vivo tumour growth. a, Proposed model of the CPD-IGF1IR 
interaction. b, Schematic of 1D4 reporters used to test the model ina.c, Flag 
and 1D4 immunofluorescence from the 1D4 reporters, measured in control or 
CPD-knockout H23 cells grownin 2D culture, untreated or treated with FURIN 
inhibitor. Scale bars, 40 pm. d, Immunofluorescence of HA-RKRR reporter in 
control or CPD-knockout H23 cells. Scale bar, 40 pm. e, Ratios of ID4 (or HA) to 
Flag immunofluorescence relative to the control 1D4—RKRR or tothe control 
HA-RKRR for conditions in cand d. *P=1.38 x 10°, two-sided t-test (n=19, 30, 
18,12, 20, 21,18,18,18 and18 fromleftto right, mean+s.d.).f, Schematic forthe 
competitive tumour-growth assay. g, Immunofluorescence of mCherry and 
GFP signal in day 30 tumour sections. Scale bar, 100 pm. Original 
magnifications: 10x (main images), 20x (insets). Immunofluorescence 
experiments were repeated on two tumours per condition. h, Changesin 
mCherry/GFP ratios between day 0 and day 30 following tumour 


in both control cells and cells in which CPD was deleted. FURIN inhibi- 
tion would be expected to prevent cleavage of pro-IGF1R and exposure 
of the RKRR motif. Insertion of even a single amino acid between the 
1D4 and the RKRR motif diminished the 1D4 signal, demonstrating the 
precise requirement for the removal of RKRR. AnIGFIR reporter with 
a control haemagglutinin (HA) epitope upstream of RKRR showed a 
strong HA signal in both control and CPD-depleted cells (Fig. 4d, e). 
Similarly, CPD-mediated removal of the RKRR motif was observed 
in H322 and A549 cells (Extended Data Fig. 9c, d). Together, these 
data demonstrate that CPD is a carboxypeptidase that is required for 
IGF1IR maturation. Notably, pro-MET is also cleaved by FURIN after a 
KRKKR motif. Although we observed toxicity upon expression of a 
MET 1D4 reporter in H23 cells, we were able to express the reporter in 
H322 cells—deletion of CPD prevented removal of the KRKKR motif in 
these cells (Extended Data Fig. 9e). Therefore, MET is also a probable 
substrate of CPD. 


CPD as atherapeutic target for lung cancers 


Given the known role of IGF1R signalling in cancers, we further 
assessed whether CPD deletion affects in vivo tumour growth. We 
performed competitive growth assays by subcutaneous injection of 
amixed pool of H23 cells that expressed either asgRNA targeting CPD 
(labelled with mCherry) or a control sgRNA (labelled with GFP) into 
mice (Fig. 4f). Immunofluorescence images of tumour sections showed 
that the tumours were dominated by GFP-expressing cells, indicating 


140 | Nature | Vol580 | 2 April 2020 


transplantation. *P=4.3 x 10~°, two-sided t-test; n=8 tumours per group). In 
the box plot, the centre line shows the median, box limits mark upper and lower 
quartiles, whiskers represent 1.5x the interquartile range and points show 
outliers. i, Kaplan-Meier plot for patients with lung adenocarcinoma with high 
orlow CPD expression. A median split was used and curve separation was 
assessed by two-sided log-rank test. n=1,106, *P< 0.0001.j, Variation of the set 
of genes downregulated by CPD deletion in H23 spheroids were scored by gene- 
set variation analysis (CPD GSVA score; see Methods). Kaplan-Meier plot for 
survival of 479 patients with lung adenocarcinoma, divided into two groups 
with high or low CPD GSVA scores. Curve separation assessed by two-sided log- 
rank test (*P=9 x 10°) and Cox proportional-hazard test (**P=7.68 x 10“). 

k, CPD deletion sensitizes H23 cells to ARS-853, an inhibitor of KRAS(G12C). 
H23 cellsin 2D or 3D culture treated with control or CPD sgRNA and indicated 
doses of ARS-853 for 72 h. Live cells were quantified using alamar blue (n=4, 
mean +s.e.m.). 


that cells deficient in CPD did not readily form tumours (Fig. 4g). By 
contrast, deletion of CREBBP, a strongly positive hit in the 3D spheroids, 
promoted tumour growth, as reflected by dominant mCherry signal 
inthe tumours. Flow cytometry measurement of mCherry:GFP ratios 
confirmed these results (Fig. 4h). 

We next investigated whether expression levels of CPD were prog- 
nostic for patient survival. In a meta-analysis of expression signatures 
from around 18,000 human tumours with survival outcomes using 
PRECOG*, high expression of CPD is a strong indicator for poor prog- 
nosis of patients with lung adenocarcinoma (Extended Data Fig. 10a, 
b). A Kaplan-Meier plot generated from the merged data confirmed 
this result (Fig. 41). We also showed that high expression of genes down- 
regulated ina CPD knockout (identified by RNA-seq) is an indicator of 
poor prognosis in patients (see Methods, Fig. 4j, Extended Data Fig. 10c, 
Supplementary Table 8). 

KRAS mutations occur in about 17% of lung cancers“, and inhibi- 
tors have been developed”*?**s for the KRAS(G12C) mutant, the most 
common KRAS mutant in lung adenocarcinomas”). Since inhibition 
of IGF1R can inhibit growth of KRAS-mutant lung cancer*® and CPD 
was atop synthetic lethal hit with ARS-853 in our screens (Fig. 3a), we 
examined how deletion of CPD affects the response of H23 cells to 
ARS-853. CPD deletion greatly sensitized H23 cells to the drug, particu- 
larly in 3D culture (Fig. 4k). Consistent with this, expression of genes 
downregulated in CPD-knockout spheroids more strongly predict the 
survival of patients with lung adenocarcinoma with KRAS mutations 
than with wild-type KRAS (Extended Data Fig. 10d, e). 


We further investigated potential synergy between ARS-853 and loss of 
CPD in additional KRAS©”“-mutant lung cancer cell lines (Extended Data 
Fig. 10f, g). We observed even greater synergy in H358 cells, whereas no 
synergy was detected in H1792 cells. Of note, H1792 cells do not havea 
phenotype for loss of CPD (Supplementary Table 5), and shownegligible 
IGFIR expression (Extended Data Fig. 10h). This suggests that IGFIR expres- 
sion and/or dependency and KRAS mutation may serveas biomarkers for 
combinatorial therapies targeting CPD and KRAS(G12C) inlung cancers. 


Conclusions 


Here we have demonstrated a robust strategy to conduct genome-scale 
CRISPR screens in 3D spheroids. Phenotypes in 3D more closely match 
expectations for oncogenes and TSGs, and are better aligned with those 
in tumour xenografts. Accurate in vitro modelling of loss-of-function 
phenotypes in tumours is likely to become important for personaliza- 
tion of therapeutic strategies (Supplementary Discussion). For example: 
CREBBP inhibitors have been used to treat various cancers*’; however, 
in certain lung cancer lines tested here, CREBBP knockout had a nega- 
tive effect on 2D growth, but a profoundly positive effect on growthin 
3D spheroids and mouse xenografts (Fig. 3d, Fig. 4g, h, Supplementary 
Table 5), arguing against the use of CREBBP inhibitors in these cases. 

Of note, genes with differentially strong effects in 3D culture versus 
2D culture are enriched for frequently occurring lung cancer mutations. 
This could be because these genes govern the transition to more aggres- 
sive 3D growth, ahallmark of tumorigenesis”. This probably includes 
genes involved in cell adhesion or genes that enable responses to 
‘tumour-like’ stresses in the spheroids, suchas hypoxia or cell crowding. 

Ongoing efforts to investigate the roles of matrix composition*®, 
nutrient conditions”, cancer-associated fibroblasts*° and tumour- 
infiltrating immune cells” have enabled substantial improvements in 
in vitro and patient-derived organoid models of cancer. The ability to 
systematically and scalably determine which genes are required for 
growth and survival in response to such distinct environmental cues 
should facilitate both improved models for drug-target identification 
and a better understanding of cancer growth. 
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Methods 


Cell lines 

Ten non-small-cell lung carcinoma cell lines: NCI-H1437, NCI-H1568, 
NCI-1650, NCI-1975, NCI-H322, NCI-H1792, NCI-H2009, NCI-H23, NCI- 
H358 and A549, were purchased from the American Type Culture Col- 
lection. All cell lines were authenticated using the Human 9-Marker STR 
Profile test provided by IDEXX BioResearch and tested for mycoplasma 
contamination. Cells were cultured in RPMI1640 (Gibco) supplemented 
with 10% FBS (HyClone), penicillin—streptomycin (Genesee), and Glu- 
taMAX (Gibco). These 10 cell lines were transduced with a spCas9 lenti- 
viral vector with a blasticidin selection marker (Addgene no. 52962), and 
selected with blasticidin (10 pg mI”). Single-cell clones of these selected 
cell lines were individually tested for their Cas9-cutting efficiency by 
lentiviral infection with pMCB306”, a self-GFP-cutting reporter that 
has both GFP and an sgRNA against GFP onthe same backbone. Single 
clones with high Cas9-cutting efficiency were established and used in 
the CRISPR screens and other biological assays. 


Large-scale 3D spheroid cultures 

To culture lung cancer cells as 3D spheroids at genome scale, we used 
either pre-treated ultra-low attachment plates (Corning, no. 3261) or 
polyhema (Sigma, no. P3932) coated tissue culture plates. Methylcellu- 
lose (0.75%; Fisher, no. M-352) was added in RPMI1640 growth medium 
to prevent excessive aggregation of cells in spheroid culture and to 
maintain even spheroid size. To determine an appropriate cell density 
for CRISPR screens, we tested multiple seeding densities of H23 cells 
ranging from 20,000 cells per cm?to 150,000 cells per cm’, with 500 pl 
of growth medium per cm’. H23 cells were seeded at multiple densi- 
ties and their growth and death rates were monitored in an automated 
fluorescent microscope optimized for live-cell imaging (IncuCyte S3 or 
IncuCyte ZOOM, Essen Bioscience). Cell growth rates were monitored 
by mCherry expressed in the cell line and death rates were monitored 
by Sytox Green signal, which was added at 100 nM final concentration 
at the beginning of the experiment. Here, the number of live cells in 
spheroids was estimated by dividing total integrated mCherry intensi- 
ties of spheroids by the average integrated mCherry intensity of single 
live cells measured at the initial cell-seeding phase. The number of 
dead cells was estimated similarly by dividing total integrated Sytox 
Green intensities of spheroids by the average integrated Sytox Green 
intensity of a single dead cell. We chose a cell density (50,000 cells per 
cm’) that showed about 30% peak cell death rate within 24 h after initial 
seeding. For all subsequent experiments, cells were initially seeded at 
50,000 cells per cm? density in 500 pl of RPMI 1640 medium contain- 
ing 0.75% methylcellulose. Spheroids were then split every 3-4 days. 
To passage cells, cancer spheroids were collected in methylcellulose 
media and diluted with PBS (~3 medium volumes) to reduce viscosity of 
the medium before centrifugation. Spheroids were then centrifuged at 
800g for 15 min and medium and PBS was removed from the spheroid 
pellets. Accutase (Innovative Cell Technologies, no. AT104) was added 
to the pellets to dissociate the spheroids into single cells. We used 10 
ml of accutase per 100 million cells in spheroids and incubated them 
for about 30 min until spheroids were fully dissociated into single cells. 
The single cells were then reseeded at the starting density (50,000 cells 
per cm’, 500 pl growth medium per cm’). 


Genome-wide and batch-retest CRISPR screens 

The genome-wide CRISPR library and the batch-retest library were 
synthesized by Agilent and cloned as previously described”. The 
genome-wide CRISPR library was designed to have ~210,000 sgRNAs 
targeting 21,000 coding genes (10 sgRNAs per gene), with 13,500 nega- 
tive control sgRNAs that are either scrambled, non-targeting sgRNAS 
or safe-sgRNAs targeting nonfunctional regions of human genomes. 
To design the batch-retest library, genes with 3D/2D phenotypes with 


T-score cut-off (lower than —2.5 or higher than 3) were first selected 
from the H23 genome-wide screens. We also included hits obtained 
inthe 2D and 3D screens in the presence of the KRAS inhibitor, with 
phenotypes having a T-score cutoff (lower than -2.5 or higher than 2.5). 
In addition, we included genes with known clinical drugs or druggable 
genes (suchas kinase, phosphatase and other enzymes) and manually 
curated RAS-pathway genes that were hits in both 2D and 3D. The batch- 
retest library had 5,466 sgRNAs targeting these 911 hit genes (6 sgRNAS 
per gene) and 273 safe-sgRNAs. In brief, oligo pools for the libraries 
were synthesized (Agilent), PCR-amplified, digested with BstXI and 
BlpI restriction enzymes, and ligated into pMCB320 vector containing 
an mU6 promoter to drive sgRNA expression and a EFla promoter to 
drive expression of mCherry fused to puromycin with a T2A linker. The 
plasmid libraries were then transfected into HEK239T cells to produce 
lentiviral pools, which were subsequently transduced into H23 cells and 
other indicated lung cancer cell lines. Cells were infected with the librar- 
ies ata multiplicity of infection of 0.3-0.5, and after 48 h were selected 
with puromycin (2 pg mI) for 3-5 days until the library-infected cell 
population was at least 90% mCherry positive (indicating presence of 
lentivirus). Cells were expanded for another 2-3 days and aliquots were 
saved as TO stocks in liquid nitrogen. At the same time, the remaining 
cells were plated as 2D monolayer cultures or as 3D spheroids using the 
protocol described above. To maintain library complexity, the screens 
were performed at ~1,000~ cell number coverage per sgRNA for the 
genome-wide screens (~200 million cells) and at ~2,000~ cell number 
coverage for the batch-retest screens (~10 million cells). All screens 
were performed in biological replicates. In the genome-wide screens, 
we included an arm in which H23 cells were treated with ARS-853 at 
5uMthroughout the screens. Both 2D and 3D cultures were split every 
3-4 days to keep cells in log growth phase throughout the screens. At 
day 21, cells were collected and stored in multiple cryovials (no. of cells 
ineachcryovial for at least ~1,000x library coverage) in liquid nitrogen 
for further processing. Genomic DNA was extracted from the samples 
with Qiagen Blood Maxi Kit (Qiagen, no. 51194). sgRNA cassettes were 
PCR-amplified from genomic DNA using Herculase II Fusion polymer- 
ase (Agilent, no. 600679) and deep-sequencing adapters and sample 
barcodes were added during the PCR™. Finally, sgRNA compositions 
inthe samples were measured with deep-sequencing on NextSeq 550 
system (Illumina). Enrichments or disenrichments of sgRNAs either 
between TO and end time point samples or between drug untreated 
and treated samples were then used to calculate growth or drug resist- 
ance phenotypes. 


Construction of CDKO library and CDKO screen 

The 145 x 145 CRISPR double-knockout (CDKO) library was constructed 
as previously described”. In brief, 145 genes that have most negative 
3D/2D phenotypes were chosen for the CDKO library. The three sgRNAS 
that showed the strongest effects in the genome-wide screens were cho- 
sen for each gene. A total of 463 sgRNAs (435 gene-targeting sgRNAs and 
28 safe sgRNAs) were PCR-amplified from pooled oligo chips (Agilent) 
and cloned into pMCB320 and pKHHO30, which are lentiviral vectors 
with mU6 or hU6 promoters, respectively. hU6-sgRNA-tracrRNA cas- 
settes were then digested from the single-knockout CRISPR library 
based on pKHH030 and ligated into the single-knockout CRISPR library 
based on pMCB320 downstream of the mU6-sgRNA-tracrRNA cassettes. 
This generated the 145 x 145 CDKO library, which had 214,368 double- 
sgRNAs corresponding to 10,440 gene pairs. The CDKO screen was 
performed as other CRISPR screens at ~-1000~ cell number coverage per 
sgRNA pair for 21 days in 2D monolayer H23 cells (~200 million cells). 
The screens were carried out in two experimental replicates starting 
from thesame TO population. Genomic DNA from both TO and Day 21 
samples were isolated and frequencies of double-sgRNAs were quan- 
tified by deep sequencing using a modified paired-end, single index 
protocol on NextSeq 550 as previously described®. 


Calculation of growth and drug resistance phenotypes 

Effect sizes for sgRNAs were calculated as previously described”. In 
brief, log, fold enrichments of sgRNAs were first measured between two 
samples: TO and day 21 samples for 2D and 3D phenotypes, TO and day 
30 samples forin vivo phenotypes, 2D day 21 and 3D day 21 samples for 
3D/2D phenotypes, 2D day 21 and ARS-853 treated 2D day 21 samples 
for KRASi2D phenotypes, and finally 3D day 21 and ARS-853 treated 3D 
day 21samples for KRASi 3D phenotypes. The 3D/2D phenotypes were 
obtained by calculating enrichment of sgRNAs (read counts of sgRNAs) 
by comparing 2D day 21 samples with 3D day 21 samples directly. For 
any given phenotype, a median log, fold enrichment of all negative 
control sgRNAs (non-targeting and safe sgRNAs) was measured and this 
median value was subtracted from log, fold enrichments of allsgRNAS 
to account for systematic bias in screens. Lastly, log, fold enrichments 
of allsgRNAs were divided by the standard deviation of negative control 
sgRNAs to yield phenotype Z scores (pZ) of sgRNAs which we used as 
effect size of sgRNAS. Effect size of a gene is the median value of all 
sgRNAs that target the gene. We used modified t-value scores as our 
phenotype scores for genes, which account for both consistency and 
strength of all sgRNA effects for given genes. 

Our phenotype scores based on t-value scores were computed 
as: phenotype score (T-score) = (Ugene — Ucert)/V(Syar/Nexp + Svar/Nete)» 
where U,.,. is the median effect of all sgRNAs (pZ) for a given gene, 
Ug iS the median effect of all negative control sgRNAs (pZ), and S,,, 
iS Va gene * (Nexp — 1) + Vat cert X (Neti — 1), where Var gene is the variance of 
sgRNA effects (pZ) for a given gene, Nx, is the number of sgRNAS for 
a given gene and N.,,, is the average number of sgRNAs per gene ina 
given screen. 

To combine data from two experimental replicates, normalized pZ 
scores of sgRNAs from two replicates were pooled together and gene 
effects and phenotype scores were calculated from the pooled sgRNAs 
as described above. 


Calculation of genetic interaction scores 

Genetic interactions of gene pairs inthe CDKO library were computed 
as previously described”. In brief, the single-knockout phenotype of an 
sgRNA was calculated from phenotype Z scores of all double sgRNAs 
that have that sgRNA paired with control (safe) sgRNAs. Safe control 
sgRNAs target regions of the genome predicted to be non-functional”. 
The expected double-knockout phenotype of a double-sgRNA pair was 
computed by summing single-knockout phenotypes of two sgRNAsin 
the pair. The difference between the expected double-knockout pheno- 
type and the observed double-knockout phenotype ofa given double 
sgRNA was then defined as the raw genetic interaction (GI) score of the 
double sgRNA. The raw GI of the double sgRNA was then normalized 
by the standard deviation of 200 double sgRNAs that have the most 
similar expected double-knockout phenotypes to account for system- 
atic bias of genetic interactions along increasing phenotype strength 
of double-sgRNAs. These normalized genetic interactions (norm Gls) 
of double sgRNAs were then used to calculate genetic interactions at 
the level of gene pairs. Three sgRNAs were assigned for each gene in 
the library, which gave a total of 9 combinations (3 x 3) for the gene pair 
in one orientation. Since there are two possible orientations for a gene 
pair (for example, A-B and B-A), there are at most 18 double sgRNAs 
that target a gene pair. The norm Gl ofa gene pair is simply the median 
value of all double-sgRNAs against the gene pair. We used GI,score and 
Glyscore as statistical scores to measure genetic interactions of gene 
pairs” in the CDKO library. In brief, the Gl,score for a given gene pair 
was calculated on the basis of the modified t-value score and Gl,,score 
is signed log,, Pvalue measured by Mann-Whitney U-test. Both scores 
take into account the strength and consistency of norm Gls of double 
sgRNAs, adjusted by observed noise levels reflected in non-interacting 
double-sgRNA controls that have at least one safe sgRNA in each pair. 
Mann-Whitney U-test Pvalues were multiple-test corrected to compute 


adjusted FDRs using Benjamini-Hochberg procedure. In the 145 x 145 
matrix of Gl,scores, genes were hierarchically clustered with correlation 
distance calculated by Pearson correlation coefficients to generate 
the GI map. These correlation distances were also used to rank genes 
by their similarities to CPD in terms of their GI patterns. To combine 
data from two experimental replicates, norm Gls of double sgRNAs 
from two replicates were pooled together and norm Gls of genes and 
Gl scores were then computed as described above. 


Annotation of cancer genes, TSGs and oncogenes 

The Catalogue of Somatic Mutations in Cancer (COSMIC” v.86) was 
used to annotate genes as tumour suppressors or oncogenes. COSMIC 
is an expert-curated database of 719 somatic mutations for which roles 
in cancer are manually annotated by experts in the field. There are seven 
defined roles of the mutations in the database: oncogene, oncogene 
fusion, TSG, TSG fusion, fusion, oncogene-TSG and oncogene-TSG- 
fusion. For analysis of gene phenotypes and comparison toroles in 
cancer, we pooled genes in oncogene and oncogene-fusion catego- 
ries and defined them as oncogenes. Genes in TSG and TSG-fusion 
categories were defined as TSGs. 


Analysis of lung cancer mutations 

Comparisons between CRISPR phenotypes of genes and their signifi- 
cance as lung cancer mutations were performed using previously pub- 
lished data for lung cancers”. In the dataset, exome sequences and copy 
number profiles of 660 lung adenocarcinoma and 484 lung squamouse 
cell carcinoma tumour-normal pairs were analysed. This generated a 
list of 11,249 genes that were reported to be mutated at least onceinthe 
lung cancer samples. Their mutational significances were computed 
with MutSig2CV~ and also provided in the dataset. Sign-flipped log,, 
MutSig2CV q values were then summed and displayed as cumulative 
sum plots along genes sorted by different screening phenotypes. 


Analysis of DepMap CRISPR datasets 

The Avana dataset (v.18Q4) with CERES effects of ~18,000 genes across 
517 cell lines was downloaded from the DepMap website (https://dep- 
map.org/portal/download/). To measure the percentage of positive hits 
inthe CERES cell lines, absolute CERES effects were used to sort genes 
in descending order in each cell line. The first 1,000 genes were selected 
and the percentage of genes with positive CERES effects was measured 
in the 1,000 genes for each cell line. Cell lines were then grouped by 
their tissues of origin and the percentage of positive hits in each cancer 
were plotted as box plots (Fig. 1a). To define 50 core essential genes, 
we averaged CERES effects across the 517 cell lines. Genes were then 
sorted by average CERES effect in ascending order and the 50 genes 
with the most negative or toxic average CERES effects were defined as 
‘core’ essential genes. To measure correlation of genes in terms of their 
cancer dependencies, CERES effects were first subject to a PCA-based 
correction method for genome-wide screening data”. This bias correc- 
tion was shown to bolster the sensitivity and specificity of detecting 
true co-essentiality of gene pairs. Pearson correlation coefficients of 
genes were measured in the matrix of batch-corrected CERES effects. 


Identification of enriched co-essential functional modules 

We used generalized least squares (GLS) to map co-essential inter- 
actions across all pairs of genes in the Avana dataset (v.18Q3) while 
automatically accounting for relatedness between cell lines”’; unlike 
conventional approaches to co-essentiality mapping based on Pearson 
correlation, this approach yields non-inflated P values. We applied GLS 
to the matrix of CERES effects corrected with the PCA-based correction 
method described above”. We then applied the ClusterONE cluster- 
ing algorithm, originally developed to discover protein complexes 
de novo from protein-protein interaction data, to cluster genes into 
‘co-essential modules’ in an unbiased fashion, based on their co-essen- 
tiality profiles across all other genes. Specifically, we ran ClusterONE on 


Article 


the gene-by-gene matrix of GLS Pvalues after row-wise FDR correction, 
with edge weights set to one minus the FDR g value™’. To determine 
which co-essential modules were enriched in the different screening 
phenotypes, the probability that the distribution of members ina given 
module in terms of their phenotypes scores was significantly differ- 
ent from that of all genes was measured using Mann-Whitney U-test. 
Sign-flipped log,, Mann-Whitney U-test P values and median effects 
of members in co-essential modules were plotted in volcano plots as y 
axis and x axis, respectively (Fig. 3a, Extended Data Fig. 4c). The most 
enriched co-essential modules from different screen phenotypes were 
then analysed. While we used GLS to define co-essential modules, we 
used batch-corrected CERES effects for visualizing co-essentiality of 
gene pairs in all scatter plots and cluster maps (Fig. 3c, Extended Data 
Fig. 4e, f). 


PANTHER pathway-enrichment analysis 

To determine which pathways were enriched among the top hits from 
the different screen phenotypes, we uploaded the top 1,000 hits from 
eachscreen phenotype into the gene ontology knowledgebase website 
(http://geneontology.org/). We then performed the PANTHER overrep- 
resentation test with PANTHER pathways™ as the annotation dataset. 
Significance of enriched pathways was measured with Fisher's exact test 
and pathways that passed 5% FDR cutoff were displayed as significantly 
enriched pathways for each phenotype with the indicated Log10 FDR. 


Subcutaneous transplantation and analysis of subcutaneous 
tumours 

Ten- to twelve-week-old female NSG mice® of similar weights were used 
for cell transplantation experiments. To determine the number of H23- 
derived cell lines to inject, several dilutions of cells (2 x 10°, 1 x 10°, 2 x 
10° and 4 x 10°) were injected into both flanks and both shoulders of 
one NSG recipient mouse per dilution (n = 4 mice; 16 tumours total). 
After ten days, 4 out of 4 palpable tumours formed from the 4 x 10° cell 
injections, compared to O out of 4 for 2 x 10° cell injections, 1 out of 4 
for the 1x 10° cell injections and 1 out of 4 for the 2 x 10° cell injections; 
therefore 4 x 10° or more cells were used for all subsequent injections. 
For the batch re-test CRISPR screens, H23 cells were transduced with the 
library as described above. After selecting the cells with puromycin, 8 x 
10° library-transduced cells in 100 il PBS were injected into both flanks 
of NSG recipient mice. (n=10 mice; 20 tumours total). Ideally, this would 
represent ~13,000x cell number coverage for the library, although the 
actual cell number coverage per sgRNA was likely much lower since a 
large portion of injected cells would not contribute to tumour develop- 
ment after subcutaneous transplantation. Four weeks after transplanta- 
tion, tumours were removed and homogenized using a tissue blender 
(Omni International, no. TH115-PCR), which was cleaned between each 
sample. Ten tumours from left flanks were pooled together as one 
experimental replicate and the other 10 tumours from right flanks were 
pooled together as the second experimental replicate. Genomic DNA 
was then extracted from these two pools using QlAamp DNA Blood 
Maxi Kit (Qiagen, no. 51194) with the manufacturer’s protocol. To PCR- 
amplify sgRNA cassettes from genomic DNA for deep sequencing, we 
used ~15x more genomic DNA than what we would use for samples from 
in vitro CRISPR screens”. In brief, we scaled a reaction based on -10 pg 
of genomic DNA in 100 pl of PCR reaction for each ~300 sgRNAs inthe 
library. This was to account for genomic DNA that came from tumour 
infiltrating mouse cells. Amplified PCR samples were sequenced ona 
NextSeq 550 as described above. For the competitive growth assays in 
tumours, total 4 x 10° H23-derived cells with roughly equal numbers of 
mCherry (gene-targeting sgRNAs) and GFP (safe sgRNAs) expressing 
cells in 100 pI PBS were injected into both flanks of four NSG recipient 
mice per genotype (n=12 mice total across three groups; 24 tumours 
total). Thirty days after transplantation, subcutaneous tumours were 
individually dissected, roughly chopped using dissecting scissors and 
further dissociated into a single-cell suspension using collagenase IV, 


dispase and trypsin at 37 °C for 30 min with rotation. After digestion, 
samples were passed through a40-um filter and maintained in PBS with 
2% FBS, 2mM EDTA, and1U ml DNase before analysis by fluorescence- 
activated cell sorting (FACS). For FACS analysis, mCherry/GFP ratio 
was determined at day O before subcutaneous injection and at day 
30 from dissociated tumours. Log fold change of mCherry/GFP ratio 
between these two time points was calculated and normalized to the 
control mix (safe mCherry/safe GFP) (Fig. 3h). The Stanford Institute of 
Medicine Animal Care and Use Committee approved all animal studies 
and procedures. 


Histologic preparation and immunohistochemistry 

Tumours from the in vivo competition assay were fixed with 4% formalin 
in PBS overnight and transferred to 70% ethanol before paraffin embed- 
ding. Paraffin-embedded tumours were sectioned into 4-"um-thick 
slices, deparaffinized with xylene and ethanol and antigen-retrieved 
in citrate buffer. Immunohistochemical staining for GFP (Abcam, 
ab13970, 1:250) and mCherry (Abcam, ab167453, 1:250) was performed 
on these 4-m-thick sections. Alexa Fluor 488 secondary antibody 
(ThermoFisher Scientific, A-11039) and Alexa Fluor 594 secondary 
antibody (ThermoFisher Scientific, A-11012) were added with Hoechst 
to visualize GFP, mCherry and nuclei in the subsequent immunofluo- 
rescence imaging. Images were taken on an inverted epifluorescence 
microscope (Eclipse Ti, Nikon) using 10 and 20x objectives. 


The 1D4 reporter system 

A1D4 epitope“ was placed just upstream of the RKRR motifin the IGFIR 
a-chain and a Flag epitope was placed at the C terminus of the IGFIR 
B-chain (1D4—RKRR) (Fig. 4b). One or two additional amino acids were 
inserted between the 1D4 epitope and the RKRR motif in the control 
reporters (ID4—ERKRR, ID4—PERKRR). An additional control reporter 
had an HA epitope instead of 1D4 (HA-RKRR reporter). 


Immunofluorescence imaging 

For immunofluorescence imaging, cells were either fixed with 4% para- 
formaldehyde in PBS for 15 min at room temperature, or fixed with ice 
cold methanol at 4 °C for 15 min; for the CPD antibody (A305-514A-M, 
ThermoFisher), we used methanol fixation and used paraformalde- 
hyde fixation for all other antibodies. Cells were washed twice with PBS 
and subsequently permeabilized with 0.2% Triton X-100 in PBS for 15 
min at 4 °C for paraformaldehyde-fixed samples. Cells were blocked 
with 3% BSA in PBS for 1h at room temperature. Primary and second- 
ary antibodies were diluted in PBS containing 3% BSA. Cells were first 
incubated with the primary antibodies overnight at 4 °C. Cells were 
then washed three times with PBS and incubated with the secondary 
antibodies and Hoechst for 2h before a triple wash in PBS. To quantify 
IGFIR-signalling activities in 2D monolayer cells, cells were processed 
ina 96-well multi-well plate and imaged either on inverted epifluores- 
cence microscope (ImageXpress Micro, Molecular Devices) using a 
10x objective or on a spinning-disk confocal microscope (Eclipse Ti, 
Nikon, CSU-W1, Yokogawa) using a 20x objective. More than four sites 
were acquired from each well and fluorescence signals were quan- 
tified across multiple image sites per condition. For the 1D4 assays, 
CPD staining and IGFIR staining in 3D spheroids, cells were processed 
in glass-bottom 24-well plates and imaged using the spinning-disk 
confocal microscope with a 10x or 20x objective. Primary antibod- 
ies were obtained from the following sources: IGFIR a-chain and CPD 
antibodies from ThermoFisher (AHRO321, A305-514A-M); antibodies 
to MET, phospho-AKT (Ser437), phospho-ERK1/2 (Thr202/Tyr204) and 
Flag from Cell Signaling Technology (no. 8198, 4060, 4370 and 14793); 
Rhol1D4 antibody from Millipore (MAB5356). 


Individual sgRNA validations using automated microscopy 
H23 cell lines expressing the indicated sgRNAs were seeded either in 
tissue-culture treated (2D monolayers) or ultra-low-attachment (3D 


spheroids) 24-well plates and loaded into an inverted epifluorescence 
microscope (IncuCyte S3 or IncuCyte ZOOM, Essenbioscience) com- 
patible with live-cell imaging. For the competition assays, ~50,000 
cells expressing gene-targeting sgRNA (mCherry) were mixed with 
~50,000 cells expressing safe sgRNA (GFP) and seeded into a well in 
24-well plates. Images were taken every 4h for the next 72h. mCherry/ 
GFP ratios were then compared between 0 hand 72h time points to 
track fold changes in the ratio. Fold changes in the ratios of samples 
were then normalized by the fold change in the ratio of safe mCherry 
and safe GFP mix to estimate relative 2D and 3D growth phenotypes of 
sgRNAs tothe control. In addition, the normalized 3D fold changes were 
divided by the normalized 2D fold changes to estimate 3D/2D growth 
phenotypes of sgRNAs. For imaging colony size from H23 knockout 
celllines, ~100,000 cells expressing gene-targeting sgRNAs (mCherry) 
were seeded into ultra-low attachment 24-well plates in the presence 
of 100 nM Sytox Green. Size and cell death of 3D spheroids from each 
knockout line was then monitored for the next 72 h. All experiments 
were performed in triplicate and sequences of sgRNAs used for the 
validation are listed in Supplementary Table 10. 


Rescue experiment with growth factors 

The competitive growth assays between CPD null H23 cells and control 
H23 cells were performed in presence of 50 ng ml of IGF1 (PHGOO71, 
ThermoFisher), EGF (E9644, Sigma-Aldrich) or HGF (294-HG-005, R&D 
Systems). The competitive growth assay was performed as described 
in the sgRNA validation experiments, but in this case, the indicated 
growth factor was added at the beginning of the experiment to measure 
its ability to rescue gene loss phenotypes. 


Drug-titration experiments 

For the drug-titration experiments, ~16,000 cells were seeded into 
tissue-culture treated 96-well plates in RPMI 1640 growth medium 
(2D monolayers) or ultra-low attachment 96-well plates in RPMI 1640 
growth medium with 0.75% methylcellulose. Cells were then grown 
for the next 72 hin presence of titrated inhibitors. At the 72 h point, 
1/10th volume of alamarBlue reagent (ThermoFisher, DAL1100) was 
added to cells and incubated ~-2 h for 2D monolayer cells and ~10 h 
for 3D spheroids at 37 °C. Fluorescence signals were then measured 
in a fluorescence plate reader (TECAN, no. 30016056; excitation at 
560 nm, emission at 590 nm) to estimate relative number of live cells 
at different dosages of the inhibitors. Wild-type H23 cells were used 
in the experiments where efficacies of small molecule inhibitors 
were compared between 2D and 3D. To test whether CPD deletion 
sensitizes cells against ARS-853, H23 cells with safe sgRNA and with 
CPD sgRNA (no fluorescent marker) were used. Small inhibitors were 
obtained from the following sources: savolitinib from Selleckchem 
(no. $7674), linsitinib from VWR (no. 10189-468), FURIN inhibitor I 
from Sigma Aldrich (no. 344930) and ARS-853 from Cayman Chemi- 
cal (no. 1629268-00-3). 


Immunoblotting 

Cells were lysed in RIPA buffer containing phosphatase and protease 
inhibitor cocktails (Roche, no. 11697498001). Lysates were then incu- 
bated onice for 15 min, then clarified at 16,000g, 4 °C, for 10 min. Pro- 
tein was quantified using the Bradford method and lysates were made 
with NuPage Sample Buffer (4x). Membranes were then probed with 
KRAS and GAPDH antibodies (1:1,000 dilution) from ThermoFisher (no. 
415700, AM4300). The following secondary antibodies were used ata 
1:5,000 dilution: anti-rabbit or anti-mouse IRDye-conjugated second- 
ary antibodies from Fisher Scientific (no. NC9401841, NC9401842, 
NCO110517 and NC9030091). Finally, membranes probed with the 
IRDye-conjugated antibodies were imaged on an infrared imaging 
system (Li-Cor, Odyssey CLx). Uncropped western blots are shown in 
Supplementary Fig. 1. 


Knocking down genes in established spheroids 

To knockdown genes in established spheroids, we transduced rtTA 
and inducible KRAB-dCas9-T2A-mCherry” under control of a tet-on 
promoter into H23 cells. These cells were treated with doxycycline for 
two days and were sorted for mCherry signal by FACS to select cells that 
can reliably induce dCas9 expression upon doxycycline treatment. 
Doxycycline was withdrawn from the sorted cells and cells were sorted 
again for loss of mCherry signal to establish an inducible CRISPRi cell 
line that can turn off dCas9 upon doxycycline withdrawal. This cell line 
was transduced with CRISPRi sgRNAs against CPD and KRAS. These 
cells were then seeded to form spheroids for 48 h, after which doxy- 
cycline was added at 0.2 pg ml concentration to induce knockdown 
target genes inthe established spheroids. Growth of spheroids was then 
monitored for the next 5 days in an automated microscope (IncuCyte 
S3, Essen Bioscience). 


PRECOG analysis 

PRECOG analysis was performed as previously described”. In brief, lung 
adenocarcinoma datasets were merged by normalizing CPD expres- 
sion within each cohort so that its mean and s.d. were 1 across stage 
1 patients. The merged set of 1,321 patients was split into high versus 
low CPD onthe basis of the median expression of CPD across the entire 
dataset. Kaplan-Meier analysis was used to assess association with 
overall survival, with Pvalue calculated by log-rank test. PRECOG Meta-Z 
scores for genes inthe CPD module across different cancer types were 
obtained from the PRECOG website (https://precog.stanford.edu/). 


RNA-seq experiment and analysis 

H23 cells expressing control (safe) sgRNA or CPD sgRNA were cul- 
tured as 2D monolayers or 3D spheroids in 100-mm tissue culture 
plates. RNA was extracted with TRIzol (ThermoFisher, 15596026) 
and processed with a RNA-seq library preparation kit (Illumina, 
RS-122-2101) to produce libraries for deep sequencing on NextSeq 
550. Library preparation and sequencing were performed according 
to the manufacturer’s protocol. Sequencing reads were mapped to 
the combined indices of cDNAs and non-coding RNA transcripts from 
GRCh38 genome reference using Kallisto”*. Differentially regulated 
genes between the two different conditions were analysed using 
Sleuth*”. Here, Sleuth computed FDRs for differential regulation of 
transcripts. If a gene has multiple transcripts, the best FDR value 
from all the transcripts was chosen to represent the FDR for differ- 
ential regulation of the gene. We then defined a set of differentially 
regulated genes using 5% FDR cut-off. Genes significantly downregu- 
lated in CPD-deleted 3D spheroids compared to control 3D spheroids 
were further analysed for their predictive power for survival rates of 
patients with lung cancer. 


TCGA outcome analysis in downregulated genes upon CPD 
deletion 

TCGA lung adenocarcinoma gene expression data (FPKM-UQ) and 
outcome and clinical data were downloaded from gdc.cancer.gov. 
We used GSVA*’ to study the association with outcome of the genes 
associated with the CPD-deleted phenotype. RNA-seq counts were 
normalized using Limma voom*’. Outcome data was censored to 
seven years. Kaplan-Meier plots were generated using the sur- 
vminer package from Bioconductor. High versus low CPD GSVA 
score was defined using the 1/3 upper versus 1/3 lower quantiles. 
Log-rank test P values are reported. Additionally, we built a Cox 
proportional-hazard model to account for key clinical covariates 
including age, stage, gender and TP53 and KRAS status. We also 
studied the interaction between CPD GSVA score and KRAS muta- 
tion status using a Cox proportional-hazard model with the same 
covariates. 
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Statistical analysis 

The statistical significance used to compare the averages of two differ- 
ent experimental groups in all box plots and bar graphs in this study 
was computed using unpaired, two-tailed Student’s t-test. No statistical 
methods were used to predetermine sample size. The experiments 
were not randomized, and investigators were not blinded to allocation 
during experiments and outcome assessment. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Sequencing data from all CRISPR screens and RNA-seq experiments 
are available under BioProject accession number PRJNA535417. All 
other data are available from the corresponding author upon reason- 
able request. 


Code availability 


All screening data were analysed with custom Python scripts (v.2.7) that 
are available at https://github.com/biohank/CRISPR_screen_analysis. 
Custom Matlab scripts (v.2015b) were used to quantify signals from all 
immunofluorescence images and to analyse FACS data: these scripts 
can be requested from K.H. 
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Extended Data Fig. 1| High quality and reproducibility of 2D and 3D genome- 
wide CRISPRscreens and hits with differential effects in the two conditions. 
a, H23 cells expressing mCherry were seeded at different densities in ultra-low 
attachment plates in the presence of 0.75% methylcellulose. Sytox Green was 
added at 100 nM concentration. Average mCherry signal and Sytox Green 
signal measured across single cells were used to estimate the total numbers of 
live cells and dead cells at each seeding density. Cell growth and death rates 
were then monitored simultaneously ona live-cell microscope for 60 h. We 
aimed for acell death rate of about 30% during the initial growth phase of 
spheroids, and 10° cells per well (1.9 cm?) was the chosen cell seeding density 
for our genome-wide screens in 3D spheroids. b, Two-dimensional growth 
phenotypes of 20,463 genes were highly reproducible between experimental 
replicates (top). Sequencing counts of 208,687 sgRNAs ina TO sample anda day 
21sample from the 2D genome-wide screens (bottom) show that most 
negative-control sgRNAs (red dots) are not enriched or disenriched between 
TO and day 21 (black dots). This indicates the complexity of the genome-wide 
library was maintained throughout the 2D screen. Inthe top plot, the data are 
fit by alinear regression line (blue dotted line). The grey line marksa 1:1 
diagonal.c, The quality and reproducibility of the 3D screens were comparable 
to those of the 2D screens, suggesting that the scalable 3D spheroid culture 
system is ona par with traditional 2D culture methods for its performancein 
genome-scale CRISPR screens. n=20,463 genes (top); n = 208,687 sgRNAS 


(bottom). Inthe top plot, the data are fit by a linear regression line (blue dotted 
line). The grey line marks a 1:1 diagonal. d, Cumulative distribution of 
sequencing reads for sgRNAs in the genome-wide CRISPR library. Read counts 
were normalized by total reads for each sample and the cumulative sums of 
sgRNAs were plotted as relative percentages of the number of expected 
sgRNAs.e, Cumulative sums of TSG counts (left) or oncogene counts (right) are 
plotted against genes sorted by their 2D, 3D or 3D/2D phenotypes (T-score) 
from the genome-wide screens in H23 cells. TSGs are expected to have positive 
growth phenotypes when deleted. Therefore, genes are sorted in descending 
order from the most positive to the most negative phenotypes in the left plot. 
Oncogenes are expected to have negative or toxic growth phenotypes and 
genes are sorted in ascending order inthe right plot. Black dotted line, 
randomly sorted genes. The first 4,000 genes are displayed. f, Summary of hits 
with differential 3D/2D phenotypes. Top positive (red-filled circles) and 
negative (blue-filled circles) hits from the differential 3D/2D phenotypes reveal 
many cancer-relevant genes associated with transcriptional regulation, cell 
motility, cell adhesion and energy metabolism. Cancer-signalling pathways 
such as Ras-MAPK, TGFB, MET, Rho, B-catenin and Hippo signalling are highly 
represented. Sizes of circles are proportional to 3D/2D phenotype scores. 

g, The10 most significant pan-lung cancer genes” and 50 top core essential 
genes are marked. Genes sorted by absolute phenotype (T-score) in2D, 3D and 
3D/2D (see Methods). 
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Extended Data Fig. 2| Genome-wide 2D and 3D CRISPR screens in H1975, 

alung adenocarcinomaline with EGFR‘**** mutation. a, Distributions of 2D 
and 3D phenotypes are shownas volcano plots. The y axis represents absolute 
T-score for each gene, and the x axis represents effect size of each gene. Size of 
dots represents absolute T-score of genes. b, Prediction of TSGs or oncogenes 
with 2D, 3D, 3D/2D phenotypes in H1975 cells. Cumulative sums of TSGs counts 
(top panel) or oncogenes counts (bottom panel) are plotted against genes 
sorted by their 2D, 3D, or 3D/2D phenotypes (T-score) from the genome-wide 
screens in H1975 cells. These data indicate 3D or differential 3D/2D phenotypes 
show marked improvement for prediction of TSGs when compared to the 2D 
phenotypes, with marginal improvement for predicting oncogenes. In the box 
plots, centre lines mark median, box limits mark upper and lower quartiles, 
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whiskers show 1.5x interquartile range and points indicate outliers. c, Enriched 
pathways among the top 1,000 hits from each culture condition were analysed 
using PANTHER overrepresentation test. Significance of enriched pathways 
was measured with Fisher’s exact test and the Benjamini-Hochberg FDR was 
subsequently computed (x axis). The EGFR signalling pathway, a known driver 
for H1975 cells, is enriched in only 3D or 3D/2D phenotypes. Number of genes 
for enriched pathways are marked to the right of bars. d, The cumulative sum of 
the significance of 11,249 pan-lung cancer mutations from 1,144 patients with 
lung cancer as measured by MutSig2CV is displayed on the yaxis, whereas thex 
axis shows phenotypes for genes sorted by their strength in 2D (solid red line), 
3D (solid blue line) or 3D/2D (solid yellow line). Black dotted line, randomly 
sorted genes. Top 3,000 genes are shown. 
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Extended Data Fig. 3 | Genome-wide 2D and 3D CRISPR screens in H2009, a 
lung adenocarcinoma line with KRAS“ mutation. a, Distributions of 2D and 
3D phenotypes are shownas volcano plots. They axis represents absolute 
T-score for each gene, and the xaxis represents effect size of each gene. Size of 
dots represents absolute T-score of genes. b, Prediction of TSGs or oncogenes 
with 2D, 3D and 3D/2D phenotypes in H2009 cells. Cumulative sums of TSG 
counts (top) or oncogene counts (bottom) are plotted against genes sorted by 
their 2D, 3D or 3D/2D phenotypes (T-score) from the genome-wide screens in 
H2009 cells. These data indicate that 3D phenotypes, and in particular the 
differential 3D/2D phenotypes showimproved prediction of both TSGs and 
oncogenes when compared with 2D phenotypes. Inthe box plots, centre lines 
mark median, box limits mark upper and lower quartiles, whiskers show 


1.5x interquartile range and points indicate outliers. c, Enriched pathways 
among the top 1,000 hits from each culture condition were analysed using 
PANTHER overrepresentation test. Significance of enriched pathways was 
measured with Fisher’s Exact test and the Benjamini-Hochberg FDR was 
subsequently computed (x axis). The Ras pathway, a known driver for H2009 
cells, is enriched in 3D/2D phenotypes. Numbers of genes for enriched 
pathways are marked tothe right of bars. d, The cumulative sum of the 
significance of 11,249 pan-lung cancer mutations from 1,144 patients with lung 
cancer as measured by MutSig2CV is displayed on the yaxis, while the x axis 
shows phenotypes for genes sorted by their strength in 2D (solid red line), 3D 
(solid blue line) or 3D/2D (solid yellow line). Black dotted line, randomly sorted 
genes. Top 3,000 genes are shown. 
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Extended Data Fig. 4 | High quality and reproducibility of optimized in vivo 
CRISPR screens and analysis of the CPD co-essential module. a, A CRISPR 
sgRNA library targeting 911 hits with differential growth effects in 3D versus 2D 
(Supplementary Table 4) was introduced into H23 cells, and introduced by 
subcutaneous injection into NSG mice. After 30 days, tumours were collected 
and sgRNAs were amplified. In vivo growth phenotypes of 911 genes were 
highly reproducible between experimental replicates (left). Sequencing 
counts of TO samples and day 30 samples from the in vivo batch-retest screens 
(right). In the left plot, the data are fit by alinear regression line (blue dotted 
line). b, Cumulative distribution of sequencing reads for sgRNAs in the batch- 
retest library in H23 cells. Read counts were normalized by total reads for each 
sample and the cumulative sums of sgRNAs were plotted as relative 
percentages of the number of expected sgRNAs.c, The 4,034 co-essential gene 
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modules based onthe DepMap CRISPR dataset are plotted as volcano plots for 
KRASi2D phenotype scores. The yaxis shows significance of enrichments of 
co-essential modules as measured in log P values from the two-sided 
Mann-Whitney U-test (see Methods); the x axis shows average gene effects of 
members in CERES modules. d, Genes in the CPD module are indicated among 
17,634 genes sorted by their correlations to CPD. Pearson correlation 
coefficients between CPD and other genes are measured in batch-corrected 
CERES effects in the DepMap CRISPR dataset. e, CERES effects of CPD, FURIN 
and /GFIR are shownas correlation plots. CERES effects are batch-corrected 
before plotting”. Blue lines, regression lines. Blue shaded translucent bands, 
95% confidence intervals. f, Lack of correlation between CPD and OR2A25, an 
olfactory receptor, in their CERES effects across 517 cancer lines. 
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Extended Data Fig. 5| See next page for caption. 
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Extended Data Fig. 5| Analysis of CPD co-essential module with a145 x 145 
gene genetic-interaction map. a, Cloning of CDKO library. A total of 463 
sgRNAs targeting 145 hits from the 3D/2D phenotypes were PCR-amplified 
from an oligonucleotide array. These 145 hits include members of the CPD 
co-essential module. sgRNAs were separately cloned into two lentiviral vectors 
with either amU6 or ahU6 promoter to generate two CRISPR single-knockout 
libraries. hU6-sgRNA cassettes were then cut out from one library and ligated 
into the other library containing the mU6 promoter. This generated a CDKO 
library with all possible pairwise combinations of the 463 sgRNAs (214,369 
double sgRNAs). This CDKO library was used to measure genetic interactions 


(GIs) of 10,440 gene pairs (145 x 145 combinations). b, The 145 x 145 genetic- 
interaction map; the 145 x 145 matrix of genetic-interaction scores are shown as 
a heat map. The 145 genes are clustered by the similarities of their genetic 
interactions (Pearson correlation coefficients of genetic interactions) inthe 
map. Members of the CPD co-essential module form a cluster (marked with red 
box) inthis genetic-interaction map, consistent with their correlations inthe 
DepMap CRISPR dataset. c, A genetic-interaction map validates the CPD 
co-essential module in H23. Correlations of genetic interactions are used to 
sort 145 genes onthe basis of their similarities to genetic interactions of CPD. 
Genes inthe CPD module are marked with red dots along the sorted genes. 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | Validation of individual sgRNAs targeting top hits 
with differential 3D/2D growth effects. a, A schematic showing the 
competitive growth assay used to validate individual sgRNAs in 2D and 3D 
conditions. Cells expressing a gene-targeting sgRNA (mCherry) are mixed with 
cells expressing a control sgRNA (safe sgRNA, encoding GFP). Relative changes 
of mCherry to GFP ratios are monitored to compute growth phenotypes of 
gene-targeting sgRNAs. b, Genes within the CPD module and selected top hits 
with differential effects in 3D versus 2D growth were targeted with individual 
sgRNAs and subjected to competitive growth assays in both 2D and 3D culture. 
Relative 2D and 3D growth phenotypes of individual sgRNAs were measured by 
tracking changes in ratios of mCherry (gene-targeting sgRNAs) to GFP (control 
sgRNA) in the assays by automated fluorescence microscopy. (n=3 wells ina 
24-well plate, mean +s.e.m.).c, Binary masks of H23 spheroids with the 
indicated gene knockouts. H23 knockout cell lines expressing sgRNAs against 
top hits from the 3D/2D phenotypes were seeded at equal density on ultra-low 


attachment plates. 3D spheroids generated from the knockout lines were 
imaged ina fluorescent microscope 72 hafter seeding. For each knockout line, 
48 images were taken from three wells in a 24-well plate using a10x objective. 
Binary masks were then generated from mCherry signals of 3D spheroids. 
Forty-eight images were then stitched together to be shownas one large image 
for each knockout. d, Relative colony masses of H23 spheroids with gene 
knockouts are quantified and displayed in bar graphs (n =3 wells ina 24-well 
plate, mean+s.e.m.).e, Genes in the CPD module and KRAS were targeted with 
corresponding small-molecule inhibitors. Cells were seeded in 96-well plates in 
2D (blue line) and 3D (red line) conditions, and grown inthe presence of 
titrating doses of inhibitors for 72 h. Live cells were quantified with alamar blue 
assays. Relative growth of treated cells compared with the untreated samples 
are plotted in the drug titration curves. n=3 wells in a96-well plate for linsitinib 
andn=4 forall other drugs; mean+s.e.m. 
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Extended Data Fig. 7 | Induced CPD knockdownin established H23 spheroids 
slows growth. a, Doxycycline (Dox; 0.2 pg ml”) was added to established 
spheroids at 48 hafter initial seeding. Spheroids were expressing both 
mCherry and KRAB-dCas9 separated by a T2A sequence under the same 
doxycycline-inducible promoter. Addition of doxycycline rapidly induced 
KRAB-dCas9-T2A-mCherry expression in spheroids. (n=3 wells ina 24-well 
plate, mean +s.e.m.). b, Immunofluorescence staining of CPD (green) showed 
that CPD sgRNAs land 3 robustly reduced CPD levels in H23 cells expressing the 
inducible KRAB-dCas9 upon doxycycline addition. CPD sgRNA 2 was less 
effective. Mean intensities of CPD immunofluorescence of two biological 
replicates were measured in the bottom bar plot. c, Immunostaining of KRAS 
(green) by western blot showed that KRAS sgRNAs Land 3 robustly reduced 
KRAS levels in H23 cells expressing the inducible KRAB-dCas9 upon 
doxycycline addition. KRAS sgRNA 2 was less effective. These experiments 
were repeated twice to confirm the result. d, Relative spheroid growth, five 
days after doxycycline addition, comparing doxycycline-treated and untreated 
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samples, measured in control cells and cells expressing CPD and KRAS sgRNA 
cells. H23 cells with inducible KRAB-dCas9-T2A-mCherry were first 
transduced with gene-targeting sgRNAs using a lentivirus that also expressed a 
GFP marker. Cells were seeded and allowed to form spheroids for 48 h. 
Doxycycline was then added and growth of spheroids in doxycycline-treated or 
untreated samples was monitored by GFP signal for another five days. 
Spheroids expressing CPD sgRNAs lor 3 and spheroids expressing KRAS 
sgRNAs lor 3 showed markedly reduced growth upon doxycycline addition, 
whereas spheroids expressing control sgRNA did not show any difference 
between doxycycline-treated and untreated samples (n =3 wells ina 24-well 
plate. mean+s.e.m.).e, Growth of spheroids expressing control sgRNA, CPD 
sgRNA3 or KRAS sgRNA 3 were monitored after doxycycline addition. Cells 
were seeded to form spheroids in the first 48 h and growth of spheroids was 
monitored by GFP fluorescence for the next 5 days (n=3 wells ina 24-well plate, 
mean +s.e.m.). 
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Extended Data Fig. 8 | CPD deletion inhibits the IGF1R pathway in H322,A549 —_c-e, IGF1Rand phosphorylated AKT levels were quantified from 


and H358 cells. a, Representative immunofluorescence images of IGFIR immunofluorecence images for H322 (c), A549 (d) and H358 (e) cells. The 
a-chain (green) in control and CPD-knockout H23 spheroids. b, Quantification dotted grey line marks a100% level (Pvalues calculated using two-sided t-test; 
of immunofluorescence ina. IGF1R a-chain intensities averaged across nine mean +s.e.m.). 


spheroids per condition. *P=2.2 x 10° (n=9, two-sided ¢-test; mean +s.e.m.). 
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Extended Data Fig. 9 | See next page for caption. 


Extended Data Fig. 9 | CPD deletion acts through the IGF1R pathway to 
inhibit 3D growth in H23 cells and CPD removes the FURIN -recognition 
motif from the C terminus of IGF1R and MET a-chain. a, The growth 
phenotype observed upon CPD deletion in H23 cells is rescued by addition of 
excess IGF1 (50 ng mI‘) to the growth medium. A CPD- or /GF1R-targeting 
sgRNA with mCherry cDNA anda safe sgRNA with GFP cDNA were transduced 
into H23 cells separately, mixed in 50:50 ratio, and cultured in 3D spheroids for 
72h. Ratios of mCherry to GFP at 72h, normalized to the ration at TO, were 
plotted in the bar graphs. Deletion of either CPD or JGFIR reduced 3D growth of 
spheroids, as reflected in the reduced mCherry-to-GFP ratios compared with 
control. Treating cells with excess IGF1 ligand (50 ng mI) rescued CPD- 
deletion phenotypes, whereas addition of EGF or HGF did not. This suggests 
that partial inhibition of the IGF1R pathway by CPD deletion can be 
compensated by over-activation of the pathway with the excess IGF1ligand. 
IGF1 could not rescue the /GFIR deletion phenotype (n=2 wells ina 24 well 
plate; mean+s.e.m.).b, Control, CPD-knockout and /GFIR-knockout spheroids 


were treated with the indicated growth factors. Sixteen images of mCherry 
fluorescence in spheroids expressing a gene-targeting sgRNA vector with 
mCherry marker were stitched together to create the images shown. 

c,d, IGF1R-1D4 reporters (see Fig. 4b) showed that removal of the FURIN- 
recognition site RKRR from the C terminus of IGF1R a-chain after FURIN 
cleavage is severely impaired by CPD deletion in H322 (c) and A549 (d) cells. 
Pvalues calculated using two-sided t-test; mean +s.e.m.e, AMET-1D4-KRKKR 
reporter (with 1D4 epitope inserted upstream of the FURIN -recognition site 
KRKKRin MET, as with IGF1R in Fig. 4b) showed that removal of KRKKR fromthe 
Cterminus of MET a-chainis severely impaired by CPD deletion in H322 cells. 
Total MET-reporter levels were measured using an antibody against MET and 
ratios of ID4 to MET signal were used to assess the degree of the KRKKR 
processing in control and CPD-null background. Error bars shows.e.m. of 
biological replicates ina 96-well plate. Pvalues calculated using two-sided 
t-test; mean+s.e.m. 
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Extended Data Fig. 10 | See next page for caption. 
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Extended Data Fig. 10 | Targeting CPD may have therapeutic effects in 
patients with lung cancer. a, Meta-Zscores of genes inthe CPD module across 
different cancer types, from PRECOG analysis*. Positive Zscore predicts that 
high expression of a given gene is associated with poor prognosis of disease. 
Pink bars show that high CPD expression predicts poor prognosis of lung 
adenocarcinoma (ADENO) (Zscore=5.59, PRECOG meta-FDR = 3.23 x10). 

b, Forest plot showing hazard ratios (HR) of CPD measured from different 
datasets (authors and PubMed IDs for the datasets are indicated on they axis). 
The HRis the increase in risk of death for each unit increase in expression of 
CPD (see Methods). Blue error bars indicate 95% confidence intervals. Number 
of patient samples used for each study is listed to the left of the plot.c, Forest 
plot showing the hazard ratios from an adjusted two-sided Cox proportional- 
hazard model, using the CPD GSVA score as a continuous variable adjusted by 
age, TP53, KRAS, stage and gender. d, Kaplan-Meier plots for patients with lung 
cancer with wild-type (left) or mutant (right) KRAS. Variation of a gene set 
downregulated by CPD deletion in H23 spheroids was first scored by GSVA (CPD 
GSVA score) in patients with lung cancer. Differences in survival among 
patients with lung cancer with high versus low CPD GSVA score are illustrated in 
Kaplan-Meier plots. High CPD GSVA scores are significantly associated with 


poor prognosis in both wild-type and mutant KRAS patient groups. However, 
the separation between high and low CPD GSVA groups is larger among KRAS- 
mutant patients than wild-type patients, suggesting an interaction between 
CPD and KRAS mutations in patients with lung cancer. Pvalues calculated using 
atwo-sided log-rank test. e, Hazard plots illustrating the two-sided Cox 
proportional log relative hazard by expression levels of CPD in KRAS-mutant 
versus KRAS wild-type samples. Grey shading corresponds to 95% confidence 
intervals. f, CPD deletion sensitizes H358 cells to ARS-853, aKRAS inhibitor. 
H358 cells with control safe sgRNA (blue line) or CPD sgRNA (red line) were 
treated with increasing doses of ARS-853 for 72 hin both 2D (top) and 3D 
(bottom). Live cells were then quantified using alamar blue assay. Relative 
growth of treated cells compared with the untreated cells is plotted against 
ARS-853 concentration. n=4 wells ina 96-well plate, mean+s.e.m.g,CPD 
deletion does not show synergy with ARS-853 in H1792 cells. Similar plots as inf 
were generated for H1792 cells (n =4 wells ina 96-well plate, mean +s.e.m.). 

h, IGF1R was quantified from immunofluorescence images of IGFIR staining 
across six lung cancer cell lines. H1792 cells show very low IGF1R expression 
compared with the other five cell lines. n= 4 for H1437, n=5 for all other cell 
lines, mean+s.e.m. 


nature research Corresponding author(s): Michael C. Bassik and Kyuho Han 


Last updated by author(s): Nov 13, 2019 


Reporting Summary 


Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency 
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist. 


Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 
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The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 
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A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 
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For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection NextSeq Control Software (v2.2.0, Illumina) was used for deep-sequencing of CRISPR screening (Fig. 1,2) and RNA-seq samples (Figure 4). 
Incucyte ZOOM/S3 control softwares (v2018a, ESSEN) were used to acquire microscopy images for the competition assay (Figure 3). 
MetaxXpress (v1.7, Molecular Devices) was used to acquire microscopy images for the IF experiments on the IGF1R signaling pathways 
(Figure 3). NISElements (v4.4, Nikon) was used to acquire confocal IF miscroscopy images (Figure 4). BD CSampler controller (v227, BD 
Biosciences) was used to acquire FACS data for the in vivo competition assay (Figure 4). 


Data analysis All screening data were analyzed with custom Python scripts (v2.7). They are available at https://github.com/biohank/ 
CRISPR_screen_analysis. Custom Matlab scripts (v2015b) were used to quantify signals from all IF images and to analyze FACS data and 
these scripts will be available upon request to K.H.. RNA-seq data was mapped using Kallisto (Bray et al., 2016) and differentially 
regulated genes were analyzed using Sleuth (Pimentel et al., 2017). PRECOG analysis was performed with custom python scripts 
previously described in Gentles et al., 2015; request for the scripts can be made to A.J.G.. Custom R scripts (v3.5.3) were used to study 
association between expression scores of CPD-downstream targets and clinical outcome of TCGA lung adenocarcinoma and they will be 
available upon request to J.A.S.. Incucyte ZOOM or S3 software (v2018a, ESSEN) was used for automated microscopy analysis. 
Significance of lung cancer mutations were analyzed using MutSig2CV (Lawrence et al., 2013) available at https:// 
software.broadinstitute.org/cancer/cga/mutsig. 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


Sequencing data from all CRISPR screens and RNA-seq experiments are available under BioProject accession number PRJNA535417. 


Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 
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For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size 10- to 12-week old, female NSG mice of similar weights were used for cell transplantation experiments. To determine the number of cells to 
inject for H23-derived cell lines, several dilutions of cells (2 x 10%5, 1 x 10%6, 2 x 10*6, and 4 x 10‘6) were injected into both flanks and both 
shoulders of one NSG recipient mouse per dilution (n=4 mice; 16 tumors total). After ten days, 4/4 palpable tumors formed from the 4 x 106 
cell injections, compared to 0/4 for 2 x 1045 cell injections, 1/4 for the 1 x 106 cell injections, and 1/4 for the 2 x 10%6 cell injections, and so 
4x 106 or more cells were used for all subsequent injections. 


For the gene-expresssion/patient survival analysis in Fig 4i, see "PRECOG analysis" in the Method. For the gene-expression/patient survival 
analysis in Fig4j, see "RNA-seq experiment and analysis" and "TCGA outcome analysis in downregulated genes upon CPD deletion" in the 
ethod. 


For all CRISPR screens and RNA-seq experiments, experiments were performed in two replicates. We determined to perform these 
experiments in two replicates based on the current standard in the field, which has yielded enough power to detect meaningful/specific 
biological effects. 


For in vitro validation assays, sample size is indicated in the figure legend for each experiment. The sample size was determined based on 
previous experience for each experiment to detect specific effects and it was not predetermined with any statistical methods. 


Data exclusions o data was excluded 


Replication Each in vivo presented in the paper was repeated in multiple mice (20 tumor flanks in 10 mice for in vivo CRISPR screening in Fig. 2c, d; 4 mice 
per genotype for the competitive growth assay in Fig. 4f-h) 


All other in vitro validation assays were successfully replicated and noted in the figure legends. Detailed materials and methods for non- 
standard in vitro experiments are described in the Method section. 


Randomization _ For all the mice experiments, mice were randomly allocated to each experimental groups 


Blinding N/A 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 

n/a | Involved in the study n/a | Involved in the study = 
Antibodies ChIP-seq 5 

L_] Eukaryotic cell lines [| Flow cytometry g 
Palaeontology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Antibodies 


Antibodies used 


Validation 


Eukaryotic cell lines 


KRAS (ThermoFisher 415700, Lot # SJ260203, Dilution 1:1000 for Western Blot), IGF1R- a (ThermoFisher AHRO321, Lot # 
UB2717301, 1:200 for IF), CPD (ThermoFisher A305-514A-M, Lot #1, 1:200 for IF), MET (Cell Signaling Technology 8198, Lot # 4, 
Dilution 1:200 for IF), phospho-AKT (Ser437) (Cell Signaling Technology 4060, Lot # 24, 1:200 for IF), phospho-ERK1/2 (Thr202/ 
Tyr204) (Cell Signaling Technology 4370, Lot #17, Dilution 1:200 for IF), Flag (Cell Signaling Technology 14793, Lot #5, Dilution 
1:1000 for IF), GAPDH (ThermoFisher AM4300, Lot # 274128, Dilution 1:1000 for western blot), Rho1D4 (Millipore MAB5356, Lot 
# 3068439, Dilution 1:2000 for IF 


CPD antibody was validated by IF (Ext. Data Fig. 7b) on CPD knockdown H23 cells. 


IGF1R antibody was validated by IF (Fig. 3e,f) on IGF1R knockout H23 cells and by IF (Ext. Data Fig. 8) on IGF1R knockout NCI- 
H322, A549, and NCI-H358 cell lines. 


Rho1D4 antibody was validated by IF on Rho1D4 reporter expressing H23 cells (Fig. 4c,e), NCI-H322 and A549 cells (Ext. Data Fig. 
9c-e) 


See the following "Manufacturers Statement" for the other antibodies : 


K-Ras Antibody (415700) is specific for human K-Ras (K-Ras2, Ki-Ras, c-K-ras, GTPase KRas) protein (accession # NP_004976.2, 
P01116), which is 100% homologous with mouse, 95% with rat, and 94% with bovine, respectively. Reactivity has been 
confirmed on western blots with human HeLa and WI-38 cell lysates as well as rat KNRK and mouse NIH 3T3 cell lysates, and 
identifies the target band at’21 kDa. The reactivity has been also confirmed with rat KNRK cells using immunoprecipitation and 
with HeLa cells by immunofluorescence. Based on amino acid sequence homology, reactivity with bovine is also expected. 
Product Citations: Kopp, F., Wagner, E. & Roidl, A. The proto-oncogene KRAS is targeted by miR-200c. Oncotarget 5, 185-195 
(2014). 


Met (D1C2) Rabbit mAb recognizes endogenous levels of total human Met protein. Monoclonal antibody is produced by 
immunizing animals with a synthetic peptide corresponding to residues near the carboxy terminus of human Met protein. 
Product Citations: Matsumoto, S. et al. GREB1 induced by Wnt signaling promotes development of hepatoblastoma by 
suppressing TGFB signaling. Nat. Commun. 10, 3882 (2019); Willbold, R. et al. Excess hepsin proteolytic activity limits oncogenic 
signaling and induces ER stress and autophagy in prostate cancer cells. Cell Death Dis. 10, 601 (2019); Jiang, S. et al. WNT5B 
governs the phenotype of basal-like breast cancer by activating WNT signaling. Cell Commun. Signal. 17, 109 (2019). 


Phospho-Akt (Ser473) (D9E) Rabbit mAb detects endogenous levels of Akt only when phosphorylated at Ser473. Species 
Reactivity: Human, Mouse, Rat, Hamster, Monkey, D. melanogaster, Zebrafish, Bovine. Monoclonal antibody is produced by 
immunizing animals with a synthetic phosphopeptide corresponding to residues around Ser473 of human Akt. Product Citations: 
Du, M. et al. Osthole inhibits proliferation and induces apoptosis in BV-2 microglia cells in kainic acid-induced epilepsy via 
modulating PI3K/AKt/mTOR signalling way. Pharm. Biol. 57, 238-244 (2019); Wang, Y., Li, B. & Zhang, X. Scutellaria barbata D. 
Don (SBD) protects oxygen glucose deprivation/reperfusion-induced injuries of PC12 cells by up-regulating Nrf2. Artif. Cells 
Nanomed. Biotechnol. 47, 1797-1807 (2019); Li, X., Zhang, Q. & Yang, Z. Silence of MEG3 intensifies lipopolysaccharide- 
stimulated damage of human lung cells through modulating miR-4262. Artif. Cells Nanomed. Biotechnol. 47, 2369-2378 (2019). 


Phospho-p44/42 MAPK (Erk1/2) (Thr202/Tyr204) (D13.14.4E) Rabbit mAb detects endogenous levels of p44 and p42 MAP Kinase 
Erk1 and Erk2) when dually phosphorylated at Thr202 and Tyr204 of Erk1 (Thr185 and Tyr187 of Erk2), and singly 
phosphorylated at Thr202. This antibody does not cross-react with the corresponding phosphorylated residues of either JNK/ 
SAPK or p38 MAP kinases. Species Reactivity: Human, Mouse, Rat, Hamster, Monkey, Mink, D. melanogaster, Zebrafish, Bovine, 
Dog, Pig, S. cerevisiae. Monoclonal antibody is produced by immunizing animals with a synthetic phosphopeptide corresponding 
to residues surrounding Thr202/Tyr204 of human p44 MAP kinase. Product Citations: Li, X., Ma, A. & Liu, K. Geniposide alleviates 
ipopolysaccharide-caused apoptosis of murine kidney podocytes by activating Ras/Raf/MEK/ERK-mediated cell autophagy. Artif. 
Cells Nanomed. Biotechnol. 47, 1524-1532 (2019); Gao, Y. et al. Mechanical strain promotes skin fibrosis through LRG-1 
induction mediated by ELK1 and ERK signalling. Commun Biol 2, 359 (2019); Wang, S. et al. Enhancement of Macrophage 
Function by the Antimicrobial Peptide Sublancin Protects Mice from Methicillin-Resistant Staphylococcus aureus. J Immunol Res 
2019. 


Flag DYKDDDDK Tag (D6WS5B) Rabbit mAb detects exogenously expressed DYKDDDDK proteins in cells. The antibody recognizes 
the DYKDDDDK peptide, which is the same epitope recognized by Sigma's Anti-FLAG® antibodies, fused to either the amino- 
terminus or carboxy-terminus of the target protein. Monoclonal antibody is produced by immunizing animals with a synthetic 
DYKDDDDK peptide. Product Citations: Wang, D. et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by 
deep learning. Nat. Commun. 10, 4284 (2019); Ji, L. et al. USP7 inhibits Wnt/B-catenin signaling through promoting stabilization 
of Axin. Nat. Commun. 10, 4184 (2019); Chuang, S. K., Vrla, G. D., Frohlich, K. S. & Gitai, Z. Surface association sensitizes 
Pseudomonas aeruginosa to quorum sensing. Nat. Commun. 10, 4118 (2019). 


Policy information about cell lines 


Cell line source(s) 


Authentication 


NCI-H1437, NCI-H1568, NCI-H1650, NCI-H1975, NCI-H322, NCI-H1792, NCI-H2009, NCI-H23, NCI-H358, and A549 are from 
ATCC 


All cell lines were authenticated by the vendor (ATCC). Authentication includes an assay to detect species specific variants of 
the 
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cytochrome C oxidase | gene (COI analysis) to rule out inter-species contamination and short tandem repeat (STR) profiling to 
distinguish between individual human cell lines and rule out intra-species contamination. After transducing Cas9 into the 10 
cell lines, we authenticated them again by Human 9-Marker STR Profile test provided by IDEXX BioResearch. 


Mycoplasma contamination We tested the 10 cell lines for mycoplasma contamination with IDEXX BioResearch and all cell lines were negative for 
Mycoplasma sp. 


Commonly misidentified lines N/a 
(See ICLAC register) 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals 10- to 12-week old female NSG mice of similar weights 

Wild animals This study did not involve wild animals. 

Field-collected samples This study did not involve samples collected from the fields. 

Ethics oversight The Stanford Institute of Medicine Animal Care and Use Committee approved all animal studies and procedures 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Flow Cytometry 


Plots 


Confirm that: 


The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 


The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a ‘group’ is an analysis of identical markers). 


All plots are contour plots with outliers or pseudocolor plots. 


A numerical value for number of cells or percentage (with statistics) is provided. 


Methodology 
Sample preparation Subcutaneous tumors formed from injections of human cancer cell lines were individually dissected, roughly chopped using 
dissecting scissors, and further dissociated into a single-cell suspension using collagenase IV, dispase, and trypsin at 37 degrees 
for 30 minutes with rotation. After digestion, samples were passed through a 40uM filter and maintained in PBS with 2% FBS, 
2mM EDTA, and 1 U/mL DNase before FACS analysis. 
Instrument BD Accuri 
Software BD CSampler controller was used to perform FACS on dissociated tumor cells and custom matlab scripts were used to analyze 


and display FCS data generated in BD CSampler software. 


Cell population abundance Purify of samples varied a lot among different tumor samples, but the purity of samples were determined by either GFP or 
mCherry fluorecence signals in cells. 


Gating strategy The same cell lines cultured in vitro were used to set the gates for the subcutaneously injected tumor cells. 


Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 


= 
ced) 
a 
= 
= 
a 
= 
O 
Za) 
© 
red) 
a 
(a) 
=P 
= 
O 
so) 
fo) 
= 
=e 
= 
a 
Wn 
iS 
=: 
=: 
red) 
S 
< 


Article 


Parental-to-embryo switch of chromosome 
organization in early embryogenesis 


https://doi.org/10.1038/s41586-020-2125-z 


Received: 3 April 2019 


Accepted: 16 January 2020 


Samuel Collombet'?"°, Noémie Ranisavljevic'*"°, Takashi Nagano*®”°, Csilla Varnai*®, 
Tarak Shisode®, Wing Leung’”, Tristan Piolot', Rafael Galupa’”, Maud Borensztein', 
Nicolas Servant’, Peter Fraser*®"™, Katia Ancelin'"™ & Edith Heard’?"™ 


Published online: 25 March 2020 


® Check for updates 


Paternal and maternal epigenomes undergo marked changes after fertilization’. Recent 
epigenomic studies have revealed the unusual chromatin landscapes that are present in 


oocytes, sperm and early preimplantation embryos, including atypical patterns of 
histone modifications” * and differences in chromosome organization and accessibility, 
bothin gametes> ° and after fertilization®* *°. However, these studies have led to very 
different conclusions: the global absence of local topological-associated domains 
(TADs) in gametes and their appearance in the embryo®” versus the pre-existence of 


TADs and loops in the zygote 


511 The questions of whether parental structures can be 


inherited in the newly formed embryo and how these structures might relate to allele- 
specific gene regulation remain open. Here we map genomic interactions for each 
parental genome (including the X chromosome), using an optimized single-cell high- 
throughput chromosome conformation capture (HiC) protocol”, during 
preimplantation in the mouse. We integrate chromosome organization with allelic 
expression states and chromatin marks, and reveal that higher-order chromatin 
structure after fertilization coincides with an allele-specific enrichment of methylation 
of histone H3 at lysine 27. These early parental-specific domains correlate with gene 
repression and participate in parentally biased gene expression—including in recently 
described, transiently imprinted loci“. We also find TADs that arise in anon-parental- 
specific manner during asecond wave of genome assembly. These de novo domains are 
associated with active chromatin. Finally, we obtain insights into the relationship 
between TADs and gene expression by investigating structural changes to the paternal 
X chromosome before and during X chromosome inactivation in preimplantation 
female embryos”. We find that TADs are lost as genes become silenced on the paternal 
X chromosome but linger in regions that escape X chromosome inactivation. These 
findings demonstrate the complex dynamics of three-dimensional genome 
organization and gene expression during early development. 


We performed allele-specific single-cell HiC, modified from previous 
studies”, on single blastomeres (at the 1-, 2-,4-, 8-and 64-cell stages, as 
well as oocytes) from highly polymorphic F, hybrid embryos that were 
obtained by crossing female Mus musculus domesticus (C57Bl/6J) with 
male Mus musculus castaneus CAST/Ei)) (Fig. 1a, b). After excluding cells 
with poor data quality (Methods, Extended Data Fig. 1a), we used the 
relative coverage of the two X chromosomes to investigate sex-specific 
differences beyond autosomes (Extended Data Fig. 1b). Finally, we 
used cell cycle phasing” to remove cells in the pre-M and M phases, 
in which chromosomes lose their organization into compartments 
and/or domains™”* (Extended Data Fig. 1c-e). Looking first at the total 


contacts (that is, not split between alleles), we detected the formation 
of TAD-like domains, with clear boundaries that appeared at specific 
stages of development (Extended Data Fig. 1f). This was confirmed by 
DNA fluorescence in situ hybridization (FISH) on three-dimensional 
(3D) preserved embryos using intra- or interdomain-specific probes 
(Extended Data Fig. 2). 


Asymmetric chromosome architecture 


Previous studies have investigated the dynamics of TADs in mouse 
embryos on the basis of TAD atlases defined in embryonic stem cells, 
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Fig. 1| Single-cell HiC approach to studying chromosome organizationin 
preimplantation embryos inthe mouse. a, Scheme of the single-cell HiC 
method on mouse F, embryos. b, Timeline of embryo collection at selected 
stages. The numbers of blastomeres after quality-filtering and sex assignment 
are indicated (c refers to cell stage). EGA, embryonic genome activation; XCI, X 
chromosome inactivation. c, Number of domains at different stages, onthe 
maternal (red) or paternal genome (blue). d, Clustering of domain dynamics 
(rows) through stages (columns). Colour scale indicates contact enrichments 


and have not attempted to identify any alternative, embryo-specific 
domains**™. Our allelic data revealed that parental genomes display 
a notably asymmetric structural organization before the eight-cell 
stage; the maternal genome displays most of the domains called at the 
one- and two-cell stages (Fig. 1c). We detected two independent gains 
in domain number-the first at the two-cell stage, and the second at the 
eight-cell stage. The second round of domain formation at the eight-cell 
stage correlated with a previously reported progressive acquisition 
of TADs®’ (Extended Data Fig. 3a). To better capture the dynamics of 
allelic domain organization, we quantified the contact enrichment 
inside domains (Methods) for both parental genomes at each stage and 
performed an unsupervised clustering (Fig. 1d, Extended Data Fig. 3a, 
b). We found that domains fall into three main categories. The first cat- 
egory (clusters 1-3) comprises parentally biased preformed domains, 
which are present as early as the one-cell stage and show a bias for the 
maternal (Fig. 1e, left) or paternal genome (Fig. le, middle). Most of 
these domains (those in clusters 1 and 3) disappear by the 4-cell stage, 
but a subset of maternally preformed domains (cluster 2) becomes 
balanced by the blastocyst stage (64-cell stage). A second category 
(clusters 4 and 5) of domain exhibits a more-transient bias for one allele, 
and generally has a weaker structure. In the third set (clusters 6-9), 
domains are acquired symmetrically on the two parental genomes at 
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inside domains (average Z-score (Methods)) and the difference in enrichment 
between the twoalleles. Mat, maternal; pat, paternal. e, Snapshots of HiC maps 
from the maternal (red) and paternal (blue) alleles for 3 regions, in cluster 1, 
cluster 3 and cluster 7. Arrowheads indicate the domains of interest. f, Single- 
cell projection in reduced space (uniform manifold approximation and 
projection (UMAP)) based on the quantification of domain contacts 

(n=470 cells). g, Cluster average contacts per kilobase per million contacts 
(CPKM) insingle cells, ordered by pseudotime fromthe trajectory inf. 


different stages after embryonic genome activation (Fig. le, right), as 
previously described®’ (Extended Data Fig. 3c). 

We also assessed whether these dynamics were discernible in single 
cells, and were not an effect of the evaluation on pseudo-bulk data. 
Notably, the quantification of domain contacts in single cells was suf- 
ficient to capture the developmental trajectories of early embryos 
(Fig. 1f, Extended Data Fig. 3d-i), as well as capturing the dynamics of 
the clusters that we identified in the pseudo-bulk data (Fig. 1g, Extended 
Data Fig. 3d-i). 

Inconclusion, our results identify parent-of-origin-specific levels of 
chromosome organization as early as the 1-cell stage that are mostly 
resolved as the 2 genomes mature towards the 64-cell stage, except 
for cluster 2. These data reconcile those of previous studies*®’, and 
provide insights into the early differential organization of the two 
parental genomes. 


Parental domains and histone modification 

To evaluate whether this unusual parental asymmetry in structure 
might be linked to specific chromatin states, we integrated our data 
with chromatin immunoprecipitation and sequencing (ChIP-seq) 
data for histone modifications from early embryos’. Notably, 
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parental-specific early domains (clusters 1-3) coincide with large 
accumulations of the Polycomb-associated mark, trimethylation of 
histone H3 at Lys27 (H3K27me3); the strongest enrichment of this mark 
is associated with the maternal genome, whereas the de novo-formed 
domains (clusters 6-9) are depleted for this mark (Fig. 2a, b, Extended 
Data Fig. 4a—e). Whereas H3K27me3 domains are maintained up to 
the eight-cell stage and diminish thereafter (Fig. 2c), the structural 
domains are lost or transformed by the four-cell stage—concomitantly 
with a transient gain in the H3K4me3 mark (Fig. 2c, Extended Data 
Fig. 4f). We note that the enrichment of H3K27me3 occurs during 
oogenesis (Extended Data Fig. 4g) and that the domains of cluster 2 
appear as early as postnatal day 5, but not in sperm (Extended Data 
Fig. 4h). 

Parentally preformed domains also exhibit interactions between 
domains similar to the patterns of A and B compartments (Fig. 2d). We 
found that the parentally preformed domains form allele-specific B-like 
compartments at the two-cell stage (Fig. 2e, Extended Data Fig. 4i, j). 
These domains also display stronger interactions between domains at 
the 2-cell stage than do the de novo domains at the 64-cell stage (Fig. 2f, 
Extended Data Fig. 4k). Parentally preformed domains are depleted for 
CTCF motifs flanking their borders (Extended Data Fig. 41), which points 
towards an independency for this factor (as has previously been shown 
for compartments”). Altogether, these results suggest that parental- 
specific domains might form local compartments associated with the 
Polycomb-repressive mark after fertilization, which later dissolve into 
the classical A and B compartments (Fig. 2g). 


Parental domains and transient imprint 


To evaluate how the allele-specific dynamics of chromosome organi- 
zation relate to gene expression, we examined previously published 
RNA-sequencing data”° obtained from equivalent F, hybrid preim- 
plantation embryos. We found that parentally preformed domains are 
associated with generally lower gene expression (Fig. 3a, top, Extended 
Data Fig. 5a) and an average lower expression on the structured allele 
(Fig. 3a, bottom), as well as a higher frequency of strongly biased genes 
(Extended Data Fig. 5b). Gene ontology analysis revealed that silenced 
genes within early preformed clusters are significantly enriched for 
terms associated with tissue morphogenesis, such as neurogenesis 
or osteogenesis (Extended Data Fig. 5c), the expression of which is 
required only at late developmental stages. Conversely, symmetric 
de novo clusters were predominantly enriched in genes that drive the 
patterning of the embryo at preimplantation (suchas cell cycle, lineage 
specification, metabolism and gene regulation). 

Maternally preformed domains encompass most genes that have pre- 
viously been described as transiently maternally imprinted“ (19 out 
of 27 genes), suchas the X inactivation centre locus (Fig. 3b, Extended 
Data Fig. 6a, b). Indeed, at the two-cell stage Xist is encompassed ina 
maternal-specific domain, the left border of which coincides with the 
Xist TAD that has previously been described in embryonic stem cells”; 
the right border of this maternal-specific domain is slightly shifted 
with respect to the previously described Xist TAD, and excludes the 
Xist transactivator Rlim (Extended Data Fig. 6c). Accordingly, Xistis 
maternally repressed, whereas the adjacent Rlim is kept expressed 
on the maternal allele and becomes silenced upon X chromosome 
inactivation” (Fig. 3c). We noticed a similar pattern of shifting from 
maternal imprinted domains at the two-cell stage to TADs at the blas- 
tocyst stages for other transiently imprinted genes, suchas 7le3, Encl 
and Mbni2 (Extended Data Fig. 6d-h). 

To investigate the importance of such domains for imprinted gene 
regulation, we focused on the maternal 3D domain spanning the Xist 
locus and engineered genetic deletions around the /px and Ftx loci, 
within a region that has previously been proposed to be sufficient 
for imprinted X chromosome inactivation” (Fig. 3d). Jpxis a putative 
regulator of Xist**. We found that mice with a deletion encompassing 
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Fig. 2 | Early domains are associated with Polycomb and form local 
compartments. a, Heat maps of H3K27me3 ChIP-seq signal on domains (scaled 
to1Mb) from each cluster, on the maternal (red) (left) and paternal (blue) (right) 
alleles. Data are taken from Gene Expression Omnibus (GEO) accession 
GSE76687.RPM, reads per million reads sequenced. b, Average ChIP-seq signal 
at the 2-cell stage on the maternal (left) and paternal (right) alleles, for cluster1 
(n=375 domains), cluster 3 (n=387 domains) and cluster 7 (n=287 domains).c, 
Quantification of H3K27me3 (top) or H3K4me3 (bottom) enrichment (versus 
mean of the genome (Methods)) or domain strength (middle, average Z-score) for 
cluster 1, cluster 3 and cluster 7 (n values as inb). Lines represent the mean, and 
shading represents the 95% confidence interval of the mean. The maternal allele 
isinred and paternal allele is in blue. H3K4me3 data are taken from GSE71434.d, 
Snapshots of ChIP-seq and HiC maps (40-kb resolution) onthe maternal (left) 
and paternal (right) alleles for alocus onchromosome17.e, Dynamics of 
compartment scores (principal component analysis first eigenvectors) for 
cluster 1, cluster 3 and cluster 7 (n values asin b). The A and Bcompartments are 
assigned onthe basis of gene density (Methods). Line represents the mean, and 
shading represents the 95% confidence interval of the mean. f, Average HiC map 
enrichment of long-distance interactions (>1 Mb) around the intersection 
between domaincentres (n values as in b). g, Model of the parental preformed 
local compartment to de novo-acquired conventional A and Bcompartments. 
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Fig. 3| Parental preformed domains are associated with a transient imprint. 
a, Distribution of gene expression (top) and fraction of maternal expression 
(maternal/(maternal + paternal)) (bottom) for genes present within the 
domains of selected clusters. Box plots (top) represent +1.5x interquartile 
range, 25thand 75th percentiles and median values. Lines represent the mean, 
and shading represents the 95% confidence interval of the mean. RPKM, reads 
per kilobase of transcript per million mapped reads. b, Snapshots of HiC onthe 
maternal (red) and paternal (blue) genome, and H3K27me3 ChIP-seq, at the 


Jpx are viable, and that normal expression of Xist occurs in these mice 
(Fig. 3e, Extended Data Fig. 6i,j). Whereas Ftx deletion alone is dispen- 
sable for imprinted X chromosome inactivation in preimplantation 
embryos”, the maternal transmission of the deletion containing /px 
and Ftx strongly compromised female viability (5 A/px -Ftx/wild-type 
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Xist locus (female cells only were pooled; n= 43 at the 2-cell stage, n= 83 at the 
64-cell stage). c, Allele-specific expression of Rlim and Xist from the 2-cell stage 
to the 64-cell stage. d, Scheme of the X inactivation centre and of the CRISPR 
deletions that we engineered. ESC, embryonic stem cell.e, Genotype 
distribution after maternal transmission of /px deletion (n=85 pups). WT, wild 
type. f, Genotype distribution after maternal transmission of /px and Ftx 
deletion (A) (n=46 pups). 


female mice out of 46 pups received the deleted allele, corresponding 
to 11% transmission) and no viable male could be obtained (0% trans- 
mission) (Fig. 3f). Taken together, our analysis identifies a minimal 
control region for imprinting in proximity to Xist, and opens up new 
possibilities for testing other transient imprint regions. 
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Fig. 4| Structural changes at the paternal X chromosome during imprinted 
X chromosome inactivation. a, Number of domains called on the maternal 

X chromosome (red) and on the paternal X chromosome (blue) over 
preimplantation development, and the relative difference of domains onthe 
paternal versus maternal alleles (autosomes in black, X chromosome in blue). 
Box plot represents the +1.5x interquartile range, 25th and 75th percentiles and 
median value for all autosomes (n =19). b, Expression dynamics for early- 
silenced, late-silenced and escapee genes (n= 40, 76 and 52, respectively, asina 
previous study”’). c, Structural changes in the corresponding domains that 
contain genes in the categories shown in b. d, Parental differential (top) and 
allele-specific (bottom and middle) HiC contact maps (pooled female cells) 
over the entire X chromosome (resolution of 640 kb) inneural progenitor cells 
(left) and at 64-cell stage (right). DNA FISH probes (oligonucleotide poolsa 

and b) are indicated in colours over the centromeric megadomain. Lines 
represent the mean and shading represents the 95% confidence interval of the 
mean. Xa, active X chromosome; Xi, inactive X chromosome. e, Representative 


3D RNA-DNAFISH images of 16-cell stage (top) or 64-cell stage (bottom) 
embryo with corresponding box plot (+1.5x interquartile range, 25th and 75th 
percentiles and median value) quantifications for signal correlations. 
Statistical significance (P< 0.001) was assessed using Wilcoxon’s rank-sum test 
(two-sided). n=39 nuclei from 8 female embryos for 16-cell stage; n=106 and 
103 signals from 106 nuclei from 4 female embryos for 64-cell stage. DNA is 
counterstained with DAPI. Scale bar, 10 pm. Bgd, background; Xm, maternal 

X chromosome; Xp, paternal X chromosome. f, Dynamics of the volume of the 
paternal and maternal X chromosomes. Box plots represent +1.5x interquartile 
range, 25thand 75th percentiles and median value. Pvalues are indicated above 
the box plot, and were calculated using Mann-Whitney Utest (two-sided). 
n=43,46,35 and 83 single cells for the 2-cell, 4-cell, 8-cell and 64-cell stages, 
respectively. g, Three-dimensional model of whole-genome conformation for 
64-cell-stage single cell number 118. Maternal chromosomesare in red; 
paternal chromosomes are in blue; autosomes in thin line and X chromosomes 
are highlighted. The model was computed at 500-kb resolution. 
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Features of imprinted X inactivation 


In differentiated female cells, the inactive X chromosome is organized 
into two megadomains rather than into A and B compartments, and 
displays amarked weakening of TADs”°”’; however, little is known of the 
dynamics of this organization during development. Pooling only female 
cells, we found that the paternal X chromosome displays a strong deficit 
in domains compared to its maternal counterpart (Fig. 4a). Whereas 
preformed maternal domains are lost, domains that are formed de novo 
become weaker on the paternal genome by the blastocyst stage, with 
the exception of a small subset of domains (Extended Data Fig. 7a). 
Comparing the dynamics of structural domains with those of gene 
expression, we found that early silenced loci on the paternal X chromo- 
some show a marked loss of domain strength only after the eight-cell 
stage (thatis, after silencing initiation), and domains that contain late- 
silenced genes display little structural change (although imprinted 
X chromosome inactivation is largely complete) (Fig. 4b, c). Although 
we cannot formally exclude that this might be due to differences in 
sensitivity between RNA sequencing and single-cell HiC, these results 
suggest that the loss of TAD structure on the paternal X chromosome 
would follow or accompany, rather than precede, gene silencing. 
Using 3D modelling of chromosomes, we also found that early 
silenced genes are localized more at the centre of the paternal X chro- 
mosome whereas escapees tend to be located at its periphery (Extended 
Data Fig. 7b), similar to differentiated cells”*. However, megadomains 
do not appear on the paternal X chromosome (Fig. 4d) despite a higher 
colocalization of intradomain probes by DNA FISH (Fig. 4e), which 
suggests a global compaction of the inactive paternal X chromosome. 
Three-dimensional modelling confirmed that the paternal X chro- 
mosome was substantially smaller (by approximately a third) than 
its maternal homologue at the 64-cell stage (Fig. 4f) and adopted a 
more globular shape (whereas the maternal X chromosome is more 
elongated) (Fig. 4g), as has previously been reported in somatic cells”°. 


Conclusions 


Here we show that higher-order chromatin structure matures from 
parental-specific and early repressive compartments towards a progres- 
sive establishment of TADs in early development in the mouse (Fig. 2g). 
This developmental switch might illustrate the autonomous mecha- 
nisms at play—cohesin-dependent and -independent—that have previ- 
ously been observed for the 3D organization of the genome” and that 
might also reflect the unusual chromatin landscape and nuclear organi- 
zation of the early embryo, compared to later developmental stages’”°. 
Early compartments are Polycomb-marked and are accompanied by 
contrasting allelic gene-expression states. These parentally preformed 
repressive domains may be important in counterbalancing genome- 
wide embryonic genome activation for transiently imprinted genes such 
as Tle3 (the dose of which affects the pluripotency programs”) or Xist 
(whichis central to the process of gene dose compensation in females”). 
Our study also illustrates that, after embryonic genome activation, 
structures tend to be TAD-like and their appearance is generally linked to 
active chromatin states. In the case of the paternal X chromosome, the 
loss of TAD structure during X chromosome inactivation is a late event 
that seems to follow—rather than precede—gene silencing. Furthermore, 
we find that there is progressive compaction of the paternal X chromo- 
some, but no megadomain formation, by the blastocyst stage. Local 
domains are maintained only across escapee loci, suggesting that local 
structure might require an active chromatin state and/or transcription. 

Overall, our study provides broad insights into the intricate inter- 
play between chromosome folding and parental gene activity with the 
developmental potential of the early embryo. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Mouse embryo collection, single-cell dissociation and 
formaldehyde fixation 

Five-week-old female C57BL/6J mice were purchased from Charles 
River. Animal care and use for this study were performed in accordance 
with the recommendations of the European community (2010/63/ 
UE). All experimental protocols were approved by the ethics commit- 
tee of the Institut Curie CEEA-IC118 under the number APAFIS#8812- 
2017020611033784v2, given by national authority in compliance with 
the international guidelines. When stated, intraperitoneal injection of 
51U pregnant mare’s serum gonadotropin, followed 46 h later by injec- 
tion of SIU human gonadotropin, were applied to induce ovulation 
of female mice. DNA FISH was performed on embryos collected from 
superovulated C57BL/6J (B6) female mice (except for the blastocyst 
stage), mated with C57BL/6J (B6) male mice. The single-cell HiC pro- 
tocol was applied to blastomeres of embryos collected from crosses 
between C57BL/6] (B6) female mice and CAST/EiJ male mice. Inthe case 
of the one-cell, two-cell and four-cell stages, some embryos were col- 
lected after female superovulation. Embryos were collected from the 
reproductive tracts in M2 medium at defined time periods according 
to mating and/or hCG administration (given in this order): 14 hor 21 
h for 1-cell stage (pronuclear stage 3 or 4), 37 hor 44 h for late 2-cell 
stage, 48 h or 55h for 4-cell stage, 55 h or 62 h for 8-cell stage and 80 
h for blastocyst stages (approximately 60 to 64 cells) (64-cell stage). 
B6 pure oocytes were collected 15 h after hCG injection. Embryos were 
included inthe analyses when they showed anormal morphology and 
the correct number of blastomeres for their developmental stage. Zona 
pellucida and polar bodies were removed using acid Tyrode’s solution 
and/or gentle pipetting (except ina few cases for the blastocyst stage). 
Embryos were incubated in Ca”*- and Mg”*-free M2 medium for 5 to 30 
mintoremove the polar body in zygotes or to isolate individual cells at 
subsequent stages. For the blastocyst stage, incubation with Ca?*- and 
Mg?"-free M2 medium was replaced with a 5-min incubation in TrypLE 
(Invitrogen). During the picking, the origin of the blastomere (inner 
cell mass or trophectoderm) was not recorded). Blastomeres were 
mechanically dissociated, rinsed three times in PBS/acetylated BSA 
(Sigma) before being fixed for 10 min in a 2% formaldehyde solution 
at room temperature. Fixation was stopped by transferring cells toa 
127-mM glycine solution (5 min onice). Blastomeres from different 
embryos were pooled from this step onwards to perform the single-cell 
HiC procedure post-fixation. 


Single-cell HiC procedure 

The procedure for embryo blastomeres was optimized froma previous 
study”. Care was taken at every step to reduce putative contamination 
between solutions. In brief, following fixation, and rapid rinses in 1x 
PBS solution 1% acetylated BSA (Sigma), blastomeres were permea- 
bilized for 30 min on ice in 10 mM Tris-HCl (pH 8), 10 mM NaCl, 0.2% 
IGEPAL CA-630 containing complete EDTA-free protease inhibitor 
cocktail (Roche). Cells were transferred to a protein low binding tube 
(Sigma) containing 0.3% SDS diluted with 1.24 NEBuffer3 for 60 min 
at 37 °C with constant agitation. Triton X-100 was added to 2% final and 
incubation was extended for 60 min, before addition of 625 U of Mbol 
(New England Biolabs) and overnight incubation. To label the digested 
DNA ends, a mix containing 28.4 pM final of dCTP, dGTP and dTTP and 
biotin-14 dATP were added with 25 U DNA polymerase, large (Klenow) 
fragment (New England Biolabs) for 60 min with constant agitation. 
After spinning, blastomeres were treated with 10 U of T4 DNA ligase 
(Thermo Fisher) in presence of 1x reaction buffer with 1x BSA (both by 


New England Biolabs) at 16 °C for at least 4 h. After spinning, blasto- 
meres were resuspended with PBS 1x and BSA 1 mg/ml to dispatch them 
individually into PCR tubes (in strips; one per tube) before storage at 
-80 °C until further processing. 


Library preparation and sequencing 

To prepare single-cell HiC libraries from single nuclei in PCR strips, 5 
pl of PBS was added to each well and crosslinks reversed by incubating 
at 65 °C overnight. HiC concatemer DNA was fragmented and linked 
with sequencing adapters using the Nextera XT DNA library prepara- 
tion kit (Illumina), by adding 10 pl of Tagment DNA buffer and 5 pl of 
Amplicon Tagment mix, incubating at 55 °C for 20 min, then cooling 
downto 10 °C, followed by addition 5 pl of Neutralize Tagment buffer 
and incubation for 5 min at room temperature. HiC ligationjunctions 
were then captured by Dynabeads M-280 streptavidin beads (Thermo 
Fisher) (20 pl of original suspension per single-cell sample). Beads were 
prepared by washing with 1x BW buffer (SmM Tris-Cl pH 7.5, 0.5 mM 
EDTA, 1M NaCl), resuspended in 4x BW buffer (20 mM Tris-Cl pH 7.5,2 
mM EDTA, 4M NaCl; 8 pl per sample), and then mixed with the 25-pl 
sample and incubated at room temperature overnight with gentle agita- 
tion. The beads were then washed 4 times with 200 ul of 1x BW buffer, 
twice with 200 pl of 10 mM Tris-Cl pH 7.5 at room temperature, and 
resuspended in 25 pl of 10 mM Tris-Cl pH 7.5. Single-cell HiC libraries 
were amplified from the beads by adding 15 pl of Nextera PCR master 
mix, 5 pl of i7 Index primer of choice and 5 pl of iS Index primer of choice. 
Samples were then incubated at 72 °C for 3 min, 95 °C for 30s followed 
by the thermal cycling at 95 °C for 10 s, 55 °C for 30s and 72 °C for 30 
s for 18 cycles, and then incubated at 72 °C for 5 min. The supernatant 
was separated from the beads and purified one by one with AMPure 
XP beads (Beckman Coulter; 0.6 times volume of the supernatant) 
according to manufacturer's instructions and eluted with 30 pl each of 
10 mM Tris-Cl pH 8.5. The eluate was purified once more with AMPure 
XP beads (equal volume to the previous eluate) and eluted with 11 pl 
of 10 mM Tris-Cl pH 8.5. 

Before sequencing, the libraries were quantified by quantitative PCR 
(Kapa Biosystems) and the size distribution was assessed with Agilent 
2100 Bioanalyzer (Agilent Technologies). The libraries were sequenced 
by 2 x 150-bp paired-end run using either a HiSeq 1500, HiSeq 2500 or 
NextSeq 500 (Illumina). 


Bioinformatics analysis 

All data were mapped to the mouse genome mm10, using the C57BL- 
6J/CAST-EiJ single nucleotide polymorphisms (SNPs) from the mouse 
genome project (v.5 SNP142), and the gene annotation from ensembl 
(v.92). Analyses were performed in R (v.3.4.2) and Bioconductor (v.3.6). 
Gene ontology was performed using the package ClusterProfiler 
(v.3.10.1). 


HiC data processing 

Data were processed with HiC-Pro® (v.2.11.0) in allele-specific mode. 
The following parameters were used: - For mapping:—very-sensitive -L 
30-score-min L,-0.6,-0.2—-end-to-end-reorder. No minimal fragment 
size, insert size or contact distance were defined. - For processing: 
GET_ALL_INTERACTION_CLASSES = 0 GET_ PROCESS SAM=ORM SIN- 
GLETON=1RM_MULTI=1RM DUP =1.-foriced scaling: MAX_ITER=100 
FILTER _LOW_COUNT_PERC = 0.02 FILTER_HIGH_COUNT_PERC = 0 
EPS =0.1. Only pairs with both reads having MAPQ > 30 were kept. 


Cell cycle phasing 

Cell cycle phasing was done by plotting the proportion of short-range 
contacts (between 25 kb and 2 Mb) versus long-range contacts (between 
2Mb and 12 Mb) insingle cells. An ellipsoid was fitted to the single-cell 
points, as ina previous publication™. The reference in polar coordinates 
was set to the segment going from the centre of the ellipsoid to the 
point of coordinates [0.15, 0.35], which corresponds to the beginning 
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of the left-ascending part of the single-cell trajectory. Cellsin the G1, S 
and G2 phases were defined as those in the angle between O and —-0.351t 
(65° anticlockwise). For each stage, contacts from all cells phased inG 
and S were pooled (all contacts or genome-specific contacts indepen- 
dently) and matrices at 10- and 40-kb resolution were created using 
cooler (v.0.7.9, parameter:—balance). Data were visualized in HiGlass®. 


Domain calling 

Domains were first identified on the 40-kb matrices, independently 
for each stage, on both the maternal and paternal genomes, using 
3dNetMod™* (v.1.0.10.06.17), with parameters favouring sensitivity 
over specificity: - PRE.LPROCESSING: region_size 150, overlap 100, 
logged True, qnorm False. - GPS: badregionfilter True, scale genome 
wide, plateau 8, chaos filter True, chaos_pct 0.85, diagonal_density 
0.65, consecutive_diagonal_zero 20. - MMCP: num _part 20, plots 
False, pctile threshold 0, pct_value 0. - HSVM: size threshold 7, size_s1 
600000, size_s2 1200000, size_s3 3000000, size_s4 6000000, size_s5 
12000000, var_thresh1 0, var thresh2 50, var_thresh3 100, var_thresh4 
100, var_thresh5 100, boundary_buffer 80000. For the analysis of the 
X chromosome in female cells, domains were called from the female 
pseudo-bulk HiC maps. 


Domain average enrichment 

We converted the HiC matrices to Z-score matrices, in which the scores 
are normalized to the distribution of scores for the same contact dis- 
tance, as ina previous publication”. In brief, for any two lociiandjon 
chromosome c, separated by a distance n and with a balanced count 
of contacts C, ,, the corresponding Z-score is Z, = (C,;— H,)/0,, in which 
H, and o, are the mean and standard deviation of the distribution of 
contact counts for any pair of loci distant by n. Z-score matrices were 
calculated on the 40-kb matrices with HicExplorer” (v.2.1.1) using the 
HicFindTads function (parameters:-correctForMultipleTesting None- 
minDepth 60000000-maxDepth 200000000-step 60000000- 
thresholdComparisons 1-delta 0). For analysis of the X chromosome, 
contacts from female cells only were pooled and matrices obtained in 
the same way. 

The average contact enrichment of domains was computed by aver- 
aging the Z-score over the domain upper triangle, excluding the diago- 
nal. For a domain D, spanning bins ito/, the upper triangle Tin matrix 
Misthe submatrix 71a,b] witha e {i,...,j-l}and be {i+1,...,jsandb>a. 
This was calculated using the custom function hicSummarizePerRegion 
for hicExplorer, available from the E.H. laboratory GitHub version of 
HiCExplorer at https://github.com/heard-lab/HiCExplorer, branch 
SummarizePerRegion, or directly from https://github.com/heard-lab/ 
HiCExplorer/blob/SummarizePerRegion/hicexplorer/hicSummarizeS- 
corePerRegion.py. We kept only domains with an average Z-score > 0.5. 


Overlapping domain filtering 

As largely overlapping domains with very similar boundaries can be 
called within or between different time points, we further filtered 
redundancy using a custom script (available on GitHub, from https:// 
github.com/heard-lab/HicTools/blob/master/FilterRegions_MinMutu- 
alOverlap_maxScore.r). In brief, starting from a set of domains D, = 0 
equal to the set ofall domains D,,, and the empty sets Doyertap ANd Dhighests 
the following steps were used: (1) From D,, all pairs of overlapping 
domains are compared two by two. (2) If their overlap represents more 
than 70% of each other’s lengths, they are added to the set Doyertap: (3) 
For each pair of overlapping domains (>70%), only the domain withthe 
highest score is kept and added to the set Dhighest- (4) D1, iS assigned the 
union Of Dyighes: aNd all domains from D,, that were not in Doyeriap- The 
procedure is repeated from step 1to step 4 until D,,,,=D,. The reinjec- 
tion in step (4) of all domains from D,, that were not in Doyertap allows 
us to keep isolated domains, as well as avoiding chains between pairs 
of domains. For stage-specific analysis (Figs. Ic, 4a) this procedure 
was applied to the domains called at each stage and on each genome 


individually. For the dynamic analysis across stages, sets of all domains 
called individually at each stage and on each genome (after this redun- 
dancy filtering) were pooled together as one set and filtered with the 
same procedure, resulting in one common set of domains. 


Clustering 

Domain dynamics clustering was performed using the R package Mfuzz 
(2.26.0)°°, using as input the average Z-score per domain (row) ineach 
stage from the 1-cell stage to the 64-cell stage, on the maternal and 
paternal genomes (columns). Fuzzification parameter m was estimated 
using the mestimate() function. The number of clusters was defined as 
nine, on the basis of the minimal distance between cluster centroids. 


Single-cell analysis 

The sum of contacts per domain for each genome per single cell was 
computed using the function hicSummarizePerRegion (as described 
in ‘Domain average enrichment’), excluding the diagonal. The matrix 
of counts (domains on rows, single-cell maternal genome and single- 
cell paternal genome on columns) was used as input for monocle3”. 
Data were processed using the preprocess_cds function using the first 
75 components of the principal components analysis (parameters: 
num_dim = 75, method = PCA’, norm_method =”log”). Dimension 
reduction was performed using UMAP with the reduce_dimension 
function (max_components = 2) and graph for pseudotime inferred 
using learn_graph (parameters: use_partition = FALSE, learn_graph_con- 
trol = list(minimal_branch_len = 3). For cluster average score, counts 
per domains were converted to CPKM by dividing the counts by the 
total number of contacts in domains per allele (divided by 10°), and 
by the domain length in kb. 


Compartments and domain interactions 

Compartments were called using HiTC (v.1.26.0)*°. An aggregate plot 
of interaction between pairs of domains was performed using acustom 
function hicAggregateContact for HicExplorer (available on GitHub, 
from https://github.com/deeptools/HiCExplorer, branch aggregat- 
eGenome; parameters:—range 1000000:999000000-numberOfBins 
200-avgType mean-genome-regionReferencePosotion centre), which 
also output the list of pairs of domains with respect to the distance 
threshold (that is, distance of more than 1 Mb). Only domains that did 
not contain another domain were used to avoid redundancy between 
domains that contained one another. The normalized contact counts 
of the intersection between pairs of domains was calculated using 
a custom function hicSsummarizeScorePerRegion for HicExplorer 
(available on GitHub, from https://github.com/heard-lab/HiCExplorer, 
branch SummarizePerRegion, or directly from https://github.com/ 
heard-lab/HiCExplorer/blob/SummarizePerRegion/hicexplorer/hic- 
SummarizeScorePerRegion.py; parameter:-summarizeType sum). 


Chromosome 3D modelling 

Three-dimensional models of chromosomes (allele-specific) was per- 
formed using the programs Dip-C and Hickit*. We performed 3 rounds 
of 3D reconstruction at 100-kb resolution with 3D haplotype imputa- 
tion (parameters: -temps 20 -s 8 4 2 0.4 0.2 0.1), and then 2 rounds of 
3D reconstruction at 20-kb resolution with 3D haplotype imputation 
(parameter “-temps 20 -s 8 42 0.4 0.2 0.10.04 0.02). Chromosome 
volumes were calculated using the alpha-convex hull algorithms from 
the R package alphashape3d (a= 0.6). 


ChIP-seq analysis 

Reads were trimmed using Trimgalore (v.0.4.4), mapped using STAR” 
(2.5.3a, parameters:—outFilterMultimapNmax 1-outFilterMismatchN- 
max 999-outFilterMismatchNoverLmax 0.06-alignIntronMax 1-align- 
MatesGapMax 2000-alignEndsType EndToEnd-outSAMattributes 
NH HI NM MD), and removed when they mapped to the mitochon- 
drial genome. The remaining reads were split by allele using SNPsplit 


(v.0.3.2). Allele-specific and unassigned .bam files were sorted, dupli- 
cates removed using Picard (v.2.18.2, parameters: REMOVE _DUPLI- 
CATES = true ASSUME_SORTED = true) and pooled as the total reads. 
BigWig of coverage files were done using DeepTools** bamCoverage 
(parameters:—extendReads-binSize 1, with-extendReads 200 for 
single-end data). A scaling factor was calculated as 10°/total number 
of reads, and the same factor was given as the parameter ‘—scaleFac- 
tor’ for both allelic signals. The heat map and average plots of signal 
were performed using DeepTools computeMatrix scale-regions (with 
parameters:—regionBodyLength 1000000-beforeRegionStartLength 
1000000-afterRegionStartLength 1O00000-binSize 50000) as well as 
plotHeatmap and plotProfile. For quantification of ChIP-seqin domains, 
reads were counted using the featureCounts function from Subread“* 
(v.1.28.1, parameters: -p -s 0). Data scaling was performed in R using 
DESeq2 (v.1.18.1), calculating the sizeFactor onthe count of total reads 
and applying it to the allele-specific counts. Enrichment relative to 
background was calculated as the ChIP-seq signal per domainin RPKM, 
divided by the average RPKM on the genome calculated in 10-kb bins. 


RNA-sequencing analysis 

Single-cell RNA-sequencing data were processed similarly to those from 
ChIP-seq, except for the mapping, for which the following parameters 
were used:-outFilterMultimapNmax 1-outFilterMismatchNmax 999- 
outFilterMismatchNoverLmax 0.06-alignIntronMax 500000-align- 
MatesGapMax 500000-alignEndsType EndToEnd-outSAMattributes 
NH HINM MD. The quantification of expression was performed using 
featureCounts (parameters: -p -s 0 -t exon-g gene id). Data were then ana- 
lysed in R using DESeq2* (v.1.18.1), calculating the sizeFactor onthe count 
of total reads and applying it to the allele-specific counts. Filtering was 
performed similarly that ina previous publication”. In single-cell data, 
a pseudo-RPKM score was calculated as the normalized count x 1,000/ 
gene length in base pairs; as the previously used protocol” is 3’-biased 
and does not recover more than the last 3 kb of the transcripts (longer 
genes (>3 kb) were assigned a length of 3 kb). In single-cell data, genes 
with a pseudo-RPKM value <5 (not allele-specific) and a count of reads 
lower than 10 reads on bothalleles were assigned as lowly expressed. An 
allelic D-score (expressiOn patemat/(€EXPFeSSiION maternal + EXPFESSION paternal) 
was calculated only for genes that were not lowly expressed, to avoid 
artefactual strong bias due to noisy low-expressed genes. Single-cell 
data were then pooled in pseudo-bulk by stage, and for each gene an 
average D-score was calculated only when more than 20% of single cells 
had an allelic D-score calculated (that is, did not show too low expres- 
sion on bothalleles). Average pseudo-RPKM values were calculated by 
averaging the pseudo-RPKM values of all single cells without filtering. 


DNA FISH probes 

Probes for DNA FISH on the X chromosome were obtained as previ- 
ously described”, or using BAC DNA for chromosome 13 (purchased 
from CHORI RP24-278M23; RP23-325G4; RP23-2B17; RP23-222A16; 
RP24-389D15; RP23-302B3; RP23-359G6; RP23-326J5; RP23-307F19) 
or were purchased from MYcroarray (fluorescent oligonucleotides, 
average length 45 bp, 5’-modified with Atto 448 or Atto 550, aver- 
age density: one oligonucleotide every 3 kb). Oligonucleotides were 
designed to tile the following consecutive 18-Mb regions: chromo- 
some X: 35,000,000-53,000,000 (termed pool a) and chromosome 
X:53,000,000-72,000,000 (termed pool b)”*. To prepare the probe 
mix for DNA FISH, 100 ng of labelled BAC DNA was used, along with 5 
pg of Cot-1 DNA and resuspended in formamide before adding equal 
volume of hybridization buffer (2x, 20% dextran sulfate; 4x SSC; 1mM 
EDTA; 0.1% TritonX-100; 0.5 mg/ml BSA; 1mg/ml PVP). Oligonucleotide 
probes were used in formamide at 10% final concentration 


DNA FISH procedure on embryonic stem cells 
FISH on cells from tissue culture was performed as previously 
described”**. Feeder-free male mouse embryonic stem cells (E14; 


GSM1366337) were cultured on gelatin-coated coverslips no. 1.5 
(1mm) and fixed in 3% paraformaldehyde for 10 min at room tempera- 
ture. Permeabilization was then performed on ice for 5 minin 1x PBS 
containing 0.5% Triton X-100 and 2 mM vanadyl-ribonucleoside com- 
plex (New England Biolabs). Coverslips were preserved in 70% EtOH 
at -20 °C. Prior to FISH, samples were dehydrated through an ethanol 
series (80%, 95% and 100%, twice) and air-dried quickly. DNA FISH was 
preceded by sample denaturation in 50% formamide in 2x SSC at pH 
7.2 at 80 °C for 40 min. After overnight hybridization at 42 °C, washes 
were carried out at 45 °C, 3 times 5 min in 50% formamide in 2x SSC 
at pH 7. 2 and 3 times 5 min in 2x SSC. DAPI at 0.2 mg/ml was used for 
counterstaining and mounting medium consisting of 90% glycerol, 
0.1x PBS, 0.1% p-phenylenediamine at pH 9 (Sigma). 


Three-dimensional DNA FISH procedure on embryos and Xist 

RNA FISH combined with DNA FISH using oligonucleotide probes 
Collected embryos were prefixed for 1 min at room temperature in para- 
formaldehyde (PFA) 1% 1mg/ml polyvinylpyrrolidone (PVP), pre-perme- 
abilized for 1 min at roomtemperature in PFA 0.5% and TritonX-100 0.4% 
and fixed for 10 min at room temperature in PFA 4%. After a brief washin 
PBS 1x with PVP 1 mg/ml and TritonX-100 0.05% (PBS-TP), embryos were 
permeabilized for 1h at 37 °C in PBS 1x with TritonX-100 0.5% (with RNase 
A5ul/mlincase of DNA FISH). After a brief rinse in PBS-TP, embryos were 
transferred into hybridization buffer 1x and equilibrated overnight with 
1mg/ml Cot-1 DNA mix at 37 °C. Embryos and probes were denatured 
for 10 min at 83 °C and put back for at least 3 h at 37 °C. After competi- 
tion in Cot-1 mix, embryos were moved into the probe mix overnight at 
37 °C. Excess of probes was eliminated through 3 washes at 45 °C in SSC 
2x solution and SSC 0.2x solution for 10 min each. Embryos were then 
briefly washed in PBS 1x and mounted in a Vectashield drop contain- 
ing DAPI under oil ona glass-bottomed plate, coated with poly-lysine. 


Microscopy and image analysis 

Combined RNA and DNA FISH imaging was performed on an inverted 
confocal microscope (Zeiss) LSM700 witha Plan apo DICII (numerical aper- 
ture 1.4) 63x oil objective. Z-sections were taken every 0.4 um. Structured 
illumination for DNA FISH was performed using an OMX system (Applied 
Precision) as in a previous publication”. Signals from all channels were 
realigned using fluorescent beads before each session of image acquisition. 
For colocalization analysis, analysis was restricted to a region of interest 
of identical volume around the FISH signal. The respective intensities 
of red and green channels were retrieved semi-automatically using the 
JACOP Image) plugin, and box plot distribution of the Pearson correlation 
coefficient was compared using Wilcoxon’s rank-sum statistics with R. 


Engineering mice 

The mouse mutant lines were generated following a previously described 
strategy”, with minor modifications. Single-guide (sg)RNAs were 
designed using CRISPOR*. For deleting the locus containing /px and Ftx, 
we used sgRNAs no. 57 (GGTCACAAT TATGCAACCTG), no. 58 (ATACTC- 
CGGATTACATACTC), no. 61 (TGCCCAAGCAAAAAGCGTGA) and no. 62 
(AAAGTATTGACACCTTACCC). For deleting the /px locus, we used sgR- 
NAs no. 57, no. 58 and no. 59 (TGCCCAAGCAAAAAGCGTGA) and no. 60 
(AGT TAGATACCACACCAAGT). T7-sgRNA PCR products were used as 
the template for in vitro transcription with the MEGAshortscript T7 kit 
(Life Technologies) and the products were purified using the MEGAclear 
kit (Life Technologies). sgRNAs were eluted in DEPC-treated RNase-free 
water, and their quality was assessed by electrophoresis on an agarose gel 
after incubation at 95 °C for 3 min with denaturing agent provided with 
the in vitro transcription kits. Cas9 mRNA (Tebu-bio, L-7206) and sgRNAs 
were injected at 100 ng/pl and 50 ng/ul, respectively, into the cytoplasm 
of mouse B6D2F1 zygotes from eight-week-old superovulated B6D2F1 
(C57BL/6] x DBA2) female mice mated to stud male mice of the same 
background. Zygotes with well-recognized pronuclei were collected 
in M2 medium (Sigma) at EO.5. Injected embryos were cultured in M16 
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medium (Sigma) at37 °C under 5% CO,, until transfer at the one-cell stage 
thesame day or at the two-cell stage the following day to the infundibu- 
lum of the oviduct of a pseudogestant CD1 female at E0.5 (25-30 embryos 
were transferred per female). All weaned mice (NO) were genotyped for 
presence of deletion (locus covering/px and Ftx, primers RG140.1: TGC- 
TACCGGTCACAGATATAAGT and RG145: TCTGGGATGCTTGTTCAACA;Jpx 
locus, primers RG140.1 and RG143: ACAAGGTGAGCGATGAGACA). Mice 
carrying deletion alleles were crossed to B6D2F1 mice and their progeny 
screened again for the presence of the deletion allele; PCR products 
were sequenced to determine the exact location of the deletions (locus 
covering /px and Ftx, chromosome X: 100,683,288-100,801,657, mm9; 
Jpx:100683306-100702361, mm9). The F, mice were considered the 
‘founders’ and bred to B6D2F1 mice; their progeny was then backcrossed 
to B6D2F1 mice, to generate heterozygous mice and lines were kept 
in heterozygosity. To establish mouse embryonic fibroblasts, single 
embryos were recovered at day 13.5 of gestation after the confirmation of 
vaginal plugs on A/px/wild-type females bred with wild-type/Y or A/px/Y 
males. Head and internal organs were removed and the body cavity was 
incubated for 1hat37 °Cin TripLE (Invitrogen). After repetitive pipetting 
up and down, the resulting chunks were putin culture for 24-48 h until 
collected to prepare RNA with Trizol extraction for further examination 
by quantitative PCR. The level of gene expression was normalized tothe 
geometric mean of the expression level of Ppia and Gapdh housekeeping 
genes according to geNorm method” to assess the relative expression of 
Xist and Jpx. The following primers were used and are listed as forward 
reverse and in 5’ to 3’: Gapdh, ccccaacactgagcatctcc/attatggggetctgs- 
gatgg; Ppia, ttacccatcaaaccattccttctg/aacccaaagaacttcagtgagagc; Jpx. 
ataaaatggcgecetccac/geccagtttctccactctcc; and Xist, ggttctctctccagaa- 
gctaggaa/tgetagatggcattetetattatateg 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The HiC data generated and analysed are available in the GEO reposi- 
tory under accession number GSE129029. Previously published data 
were downloaded from GEO: H3K27me3 in early embryos (GSE76687); 
H3K27me3 in day-5 post-natal oocytes (GSE93941); single-cell RNA 
sequencing in early embryos (GSE80810); and HiC in gametes and 
early embryos (GSE82185). Source Data for Figs. 3, 4 and Extended 
Data Fig. 2, 6 are provided with the paper. Any other relevant data are 
available from the corresponding authors upon reasonable request. 


Code availability 


The code developed for this study is available on the GitHub repository 
of the laboratory of E.H. (https://github.com/heard-lab). 
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Extended Data Fig. 1| Single-cell HiC approach to studying chromosome 
organization in mouse preimplantation embryos. a, Distribution of total 
contact versus trans ratio per single blastomere, according to developmental 
stage with given thresholds for exclusion. b, Fraction of maternal contacts on 
the X chromosome versus contacts onthe Y chromosome. The colour of each 
dot indicates the fraction of reads that cover the maternal genome. Red 
rectangles highlight female diploid cells, blue rectangles highlight male cells 
and black rectangles highlight haploid cells (that is, oocytes or polar bodies). 
Cells outside these frames were excluded. c, Percentage of short-range 
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(25 kb-2 Mb) versus long-range or mitotic contacts (2-12 Mb) per single cell, 
coloured by developmental stages. d, Subset of the single cells at eight-cell 
stage, either in G1, S or G2 phase (top) or going towards mitosis (bottom), and 
their corresponding pseudo-bulk HiC heat maps. e, Table for the number of 
single blastomeres per stage of development that passed quality control, and 
the selected number after cell cycle phasing that were used to produce the 
subsequent analysis and heat maps. f, Bar plot of domain numbers for each 
developmental stage. 
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Extended Data Fig. 2| HiC view and DNA FISH for two independent genomic 
loci. a, d, g, HiC contact maps for different genomic locations (as indicated), 
from the 1-cell to 64-cell stage. b,c, e, f,h, i, Analysis of the genomic locations 
for boundary formation (red and green probes in bottom of a, dand g) by 3D 
DNAFISH intwo-cell-stage to eight-cell-stage embryos and embryonic stem 
cells (ESC), with insets of signal for the two independent pools (b, e, h). The 
total number of combined signals (red plus green) is reported in the box plot in 
the adjacent panels (c, f, i). DNAis stained by DAPI (blue). Scale bar, 10 pm. c, 
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Box plot (+1.5x interquartile range, 25th and 75th percentiles and median value) 
distribution of Pearson’s correlation coefficient for red and green signals (in 
pools 1and 2) of DNA FISH analysis. a-c, Chromosome 13 (region 90 Mb- 

92 Mb). d-f, X chromosome (region 104 Mb-105 Mb). g-j, Chromosome 13 
(region 14 Mb-15 Mb). All experiments are performed in biological replicates, 
nis the combined signal number, centre lines denote the median coefficient. 
Statistical significance (P< 0.001) was assessed using Wilcoxon’s rank sum test 
(two-sided). 
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Extended Data Fig. 3 | Dynamics of domains in single cells. a, Distribution of 
the minimal distance between cluster centroids (D,,,,) fora predefined number 
of clusters (k) ranging from 2 to 40. Clustering was performed 100 times for 
each value of k. The optimal number of clusters is the highest value of k before 
the value D,,, becomes stagnant. b, Heat maps representing the result of 
clustering for different values of k. The same main categories are found for 
k>8.Thecontact enrichment colour scale corresponds to the maternal (red) 
and paternal (blue) heat maps; the differential contact enrichment scale 
corresponds to the differential (maternal — paternal) heat maps. c, Heat maps 
showing domain enrichmentin the bulk HiC data from GSE82185, with the same 
order as our clustering in Fig. 1d and showing similar dynamics. d, Single-cell 


projection by UMAP from the quantification of domain contacts oneachallele, 
using all cells and all chromosomes, coloured by stage (top) or by sex (bottom). 
n=669 single cells. e, Asin d but excluding domains on the X chromosome. f, As 
ine but coloured by cell cycle phasing. g, Cell cycle phasing based on short- 
range versus mitotic contacts, with the same colour scale as inf. h, Single-cell 
projections after excluding oocytes, all cells in pre-M and M phase and domains 
onthe X chromosome, as in Fig. 1f, coloured by sex (top) or by pseudotime 
overlaid with the inferred trajectory (bottom).n=470 single cells.i, Asinh, 
coloured by mean count per kb per million (CPKM) on eachallele, for the nine 
clusters identified in Fig. 1d. 


Domains clusters 


3 4 5 6 vd 8 9 b d 
if if H3K27me3 in 1c stage H3K27me3 in Epiblast 
4 4 
i 2 
yeep] § 5? 
xe] 2 
5° Siaiiain Wisi 
ul 2 2 
3 
i E-5 E, 
r 9 g 
oof | | r-4 “4 
“Mb Sa3+1Mb 123456789 123456789 723456789 123456789 
scaled domains clusters clusters 
c a e een? © 
= Clusters comparison in 1c stage Clusters comparison in Epiblast 2. 10% 
os 1 1 Se 1048 
oa 2 2 Se 10” 
2 3) 3. 
ot 2 4 eG 5 
64c/ 59 % 5] 5 ss 
B® 2 el 26 BE We 
a o 7 7 ae 4 
FS 8) 8 = 
of sles 9 Be li 
23456789 12345 
— clusters 
scaled acmains. 7 maternal = paternal 
f H3K4me3 g oocyte spem h i 
paternal 
Pesos g 2 genes|mrypermr py Yt tet ty ere — 


Domains clusters 


nN ai 


S 
jon 
a 

if 


wo 


o)) or 


“ 


Oo @ 


9 ao. sae s 
wo of © & a wo oe © & ro 
J eae cluster cluster2 cluster3 k 
A 0.20 
0.03 
0.00 eae 
~ 
Zo 
3 0.10 
g -0.03 
5 -0.08 B ip (0.05 
$ NAF © Ge WO af 0 GH Ge WO af © oF ot go aon 
e A - 
$ 0.08, ...cluster4 clusterS cluster6 fo 
co £5 020 
B 003 8 0.15 
= sx 
¢€ 
2 0.00 3 § 0.10 
g a 
@ 70.03 O® 0.05 
a co 
= -0.06 on 
Go Nw fo & Ye BE 0.00 
oo 
£ 0.06. Cluster7 cluster8 cluster9 == 0.20 
oO 
E A 0.15 
8 0.10 


NO ah © BH po WO 9h vf oF Oe 
= maternal — paternal 


Extended Data Fig. 4|See next page for caption. 


No vf HF Ve 


| | cluster 1 


2c. 
SAC atin .dialt 1d stdatanatigerpithinn 


fone 
genes Se te aL 
Cocyte C5 eainaitsntill lial anatlbihed lh td 


° 
8 


Number of motifs 
: & 


cluster 3 
0.02 


-1Mb 5’ 


i 4 


3s ep aS | Seis 
ING we ye & oe ING we 


J) 


we 


Oy 


compartments score (PC1 value) 


oC LCF motif enrichment 


cluster 1 
= maternally 
preformed 


cluster 3 
= paternally 

preformed 
_. cluster 7 

symetrical 


3’ +1Mb 


scaled domains 


cluster 7 


ING 


wv we ee Cs 


Maternal | paternal 
genome “genome 


Article 


Extended Data Fig. 4 | Chromatin changes and compartment formation over 
preimplantation. a, Average profile of H3K27me3 ChIP-seq signal at the 
domains for each parental allele at the 2-cell and 64-cell stages in clusters1to 9. 
n=375, 238, 387, 338, 110, 327, 287, 194 and 141 for each cluster from1to 9). 

b, Distributions of H3K27me3 domain enrichment per cluster, on the maternal 
(red) and paternal (blue) genomes at the one-cell stage. Box plots represent 
+1.5x interquartile range, 25th and 75th percentile and median value. n values 
are thesameasina.c, Statistical comparison, two-by-two, between each 
distribution in b. Pvalues are calculated using a Wilcoxon test (two-sided, not 
paired). nvalues are the sameasina.d,e, As inb, c for H3K27me3 ChIP-seq data 
from epiblasts. f, Heat maps of H3K4me3 ChIP-seq signal at domains of each 


cluster + 1Mb, with parental origin. g, Heat maps of H3K27me3 ChIP-seq signal 
at domains of each cluster + IMb in oocytes (post-natal day 5 or day 14; or 
ovulatory oocytes (MII)).h, Snapshots of H3K27me3 ChIP-seq signal covering 
6Mbattransiently imprinted loci (Xist, Encl, Jadel and Mbnl2) for different 
stages of oogenesis, or the maternal allele in the 2-cell and 64-cell stages. 

i, Compartment scores at domains of clusters 1- 9, according to parental origin. 
j, Dynamics of the compartment scores for each cluster. Lines represent the 
mean, and shading represents the 95% confidence interval of the mean. 
nvalues are the sameasina.k, Bar plot of long-range interactions per stage, 
corresponding to the average heat map in Fig. 2f.1, CTCF-motif enrichment 
around domains. 
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Extended Data Fig. 5| Gene expression and functionalannotationofdomain 9, respectively) for genes present within domains of the different clusters. b, 
clusters. a, Distribution of gene expression (top; n=797, 353, 612, 621,268,699, — Piecharts for allelic expression bias from the 2-cell to the 64-cell stage for 
562,278 and 193 genes for clusters 1 to 9) and fraction of maternal expression genes within clusters 1 to 9.c, Pvalue (hypergeometric test) of Gene Ontology 
(maternal/(maternal + paternal), bottom ; n= 232, 249, 256, 502, 258, 664,497, term enrichment in genes within each domain cluster. 
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Extended Data Fig. 6 | Structural tuning at maternal early domains during 
preimplantation. a, Snapshots of HiC matrices and H3K27me3 ChIP-seq 
signal, showing the parental differences between the 2-cell and 64-cell stages 
for maternal (red) and paternal (blue) genomes at chromosome 2 (9-13.5 Mb) 
containing Sfmbt2. b, Asina, for chromosome 3 (40-43 Mb) containing Jadel. 
c, Quantification of contacts within the region presented in Fig. 3b. d, Snapshot 
of HiC matrices and H3K27me3 ChIP-seq signal, showing the parental 
differences between the 2-cell and 64-cell stages for maternal (red) and 
paternal (blue) genomes at chromosome 9 (60-62.5 Mb) containing 7le3. 

e, Gene-expression dynamic for Tle3 for maternal (red) and paternal (blue) 
alleles. f, Quantification of contacts within the region shown ind. g, Snapshots 
of HiC matrices and H3K27me3 ChIP-seq signal, showing the parental 
differences between the 2-cell and 64-cell stages for maternal (red) and 


paternal (blue) genomes at chromosome 13 (96-100 Mb) containing Encl. h, As 
ing, for chromosome 14 (115-122 Mb) containing Mbn/2. i, Relative gene 
expression for Xist (in red) orJpx (in yellow) inmouse embryonic fibroblasts 
derived from embryos issued from crossing A/px/wild-type female mice with 
wild-type/Y or A/px/Y male mice. The three genotypes analysed are indicated, 
as well as the number of independently derived mouse embryonic fibroblast 
cultures from independent single embryos (n=4, 6 and 6 for wild-type/wild- 
type, wild-type/A/px and AJpx/A/px genotypes, respectively). Bar plot 
represents the mean of each independent expression value (for each embryo), 
error bars represent the s.d. and each dot represents an individual 

embryo value.j, Pie chart distribution of the genotypes obtained after mating 
AJpx/wild-type female mice with A/px/Y male mice.n=104 pups. 
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Long noncoding RNAs (IncRNAs) and promoter- or enhancer-associated unstable 
transcripts locate preferentially to chromatin, where some regulate chromatin structure, 
transcription and RNA processing! ’, Although several RNA sequences responsible for 
nuclear localization have been identified—such as repeats in the IncRNA Xistand Alu-like 
elements in long RNAs“ “°—how IncRNAsas a class are enriched at chromatin remains 
unknown. Here we describe a random, mutagenesis-coupled, high-throughput method 


that we name‘RNA elements for subcellular localization by sequencing’ (mutREL-seq). 
Using this method, we discovered an RNA motif that recognizes the U1 small nuclear 
ribonucleoprotein (snRNP) and is essential for the localization of reporter RNAs to 
chromatin. Across the genome, chromatin-bound IncRNAsare enriched with S’ splice 
sites and depleted of 3’ splice sites, and exhibit high levels of U1 snRNA binding compared 
with cytoplasm-localized messenger RNAs. Acute depletion of U1snRNA or of the U1 
snRNP protein component SNRNP70 markedly reduces the chromatin association of 
hundreds of IncRNAs and unstable transcripts, without altering the overall transcription 
ratein cells. Inaddition, rapid degradation of SNRNP7O reduces the localization of both 
nascent and polyadenylated IncRNA transcripts to chromatin, and disrupts the nuclear 
and genome-wide localization of the IncRNA Malat1. Moreover, U1snRNP interacts with 
transcriptionally engaged RNA polymerase II. These results show that U1 snRNP acts 
widely to tether and mobilize IncRNAs to chromatin in a transcription-dependent 
manner. Our findings have uncovered a previously unknown role of UlsnRNP beyond the 
processing of precursor MRNA, and provide molecular insight into how IncRNAs are 
recruited to regulatory sites to carry out chromatin-associated functions. 


To identify cis elements that contribute to the localization of RNA to 
chromatin, we developed and performed REL-seq screens with satu- 
rated fragment coverage of nine representative IncRNA and mRNA 
transcripts in mouse and human cells (Extended Data Fig. 1a—f and 
Supplementary Note 1). The strategy involves expressing a randomly 
fragmented RNA sequence alone or fusing it with a minigene encod- 
ing green fluorescent protein (GFP), and then analysing its subcellular 
location through sequencing. We detected a total of 26 chromatin- 
enriched RNA fragments (enChrs; P< 0.05), mainly 50-500 nucleotides 
in length, in the sense orientation of the host chromatin-associated 
RNAs but not in cytoplasm-located mRNAs (Extended Data Fig. 2 and 
Supplementary Tables 1, 2). To uncover key residues that contribute 
to RNA localization, we chose a 162-nucleotide NXF1-enChr, identi- 
fied from an isoform of NXF1 in which the introns are retained in the 
final mRNA, for random mutagenesis followed by REL-seq (mutREL- 
seq; Extended Data Figs. 1b, 2g). Out of 23 mutations with impaired 
chromatin binding, 19 are located in a loop region (positions 39-45) 
containing 7 nucleotides, which, together with 2 upstream nucleo- 
tides, comprise a U1-recognition site that base pairs perfectly with the 
9-nucleotide 5’ targeting sequence of U1 snRNA (Fig. 1a, b, Extended 


Data Fig. 1g—j and Supplementary Note 2). The majority of enChrs over- 
lap with predicted U1-recognition sites and/or exhibit strong binding 
signals of Ul snRNA, as shown by RNA affinity purification followed 
by RNA sequencing (RAP-RNA-seq)”, except for repeat-associated 
enChrs in Xist (Extended Data Figs. 1k, 2, Supplementary Table 2 and 
Supplementary Discussion 1). 

U1 snRNP defines the 5’ splice sites of pre-mRNAs and initiates 
spliceosome assembly at introns—a process that involves sequential 
recruitments of U1, U2 and then U4/U6-U5 tri-snRNPs"*. It has been 
reported that the 5’ splice site regulates the nuclear retention of a hand- 
ful of mRNAs!””°. To test a role of U1 recognition in RNA-chromatin 
retention, we constructed GFP reporters that harbour U1 motifs but 
lack the 3’ splice site, in vectors containing either a polyadenylation 
signal (PAS) or a Malat1 3’-end sequence (Extended Data Fig. 3a, b). 
The Malati 3’-end sequence stabilizes GFP RNA through atriple-helix 
RNA structure”, thus bypassing the inhibitory effect of U1 snRNP 
on polyadenylation and RNA stability”*. The wild-type but not mutant 
Ul1-targeting sequences promoted the chromatin association of GFPin 
boththe PAS and the 3’ Malat!1 reporters (Extended Data Fig. 3c-g and 
Supplementary Note 3). In addition, NXFl-enChr RNA captured core 
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Fig. 1| MutREL-seq identifies a U1-recognition site that contributes to 
RNA-chromatin tethering. a, MutREL-seq identifies aseven-nucleotide 
region (dashed box) that contributes to chromatin retention of NXF1-enChr. 
The yaxis shows the fold change of cytoplasmic versus chromatin signals, 
normalized to that in wild-type cells. Grey dots indicate deletions. b, Predicted 
secondary structure of NXFI-enChr and its interaction with U1lsnRNA. W:G 
wobble base-pairing is indicated by dots. The scale bar shows the probability of 
base-pairing (or lack of pairing). c, Densities of predicted strong U1- 
recognition sites in exons (top) and 3’ splice sites (3’ss) inthe gene body 


components of U1snRNP, whose knockdown impaired the chromatin 
binding of reporter RNA (Extended Data Fig. 3h-m, Supplementary 
Table 3 and Supplementary Note 4). To test a role of the 3’ splice site 
in RNA localization, we inserted an ACTB intron into a GFP reporter. 
Mutation of the 3’ splice site dramatically increased chromatin-bound 
GFPRNA, inamanner dependent onan intact 5’ U1 site (Extended Data 
Fig. 3n). These results demonstrate that U1 snRNP promotes the chro- 
matin association of a reporter RNA that harbours a5’ Ul-recognition 
site but lacks the 3’ splice sequence. 

Although U1 motifs tend to be depleted in gene exons, IncRNA tran- 
scripts exhibit substantially higher densities of the Ul-recogntion motif 
in exons, yet lower densities of 3’ splice sites in the gene body, com- 
pared with mRNA (Fig. 1c and Extended Data Fig. 4a—d). In addition, 
the densities of predicted Ul-recognition sites and levels of UlsnRNA 
binding increase gradually from the most cytoplasm-enriched mRNA 
to chromatin-enriched mRNA and then to IncRNAs, whereas their 
expression decreases with increased chromatin association (Fig. 1d 
and Extended Data Fig. 4e, f). 

The global enrichment of U1 snRNA recognition sites and binding 
onIncRNAs led us to explore their role in the localization of IncRNAs to 
chromatin. We used antisense morpholino oligonucleotides (AMOs) to 
block the 5’-end-recognition sequence of U1 snRNA. Given the broad 
involvement of U1 snRNP in cellular functions, we performed a short- 
term, 2-h treatment with U1 AMOs in mouse embryonic stem cells 
(mESCs). This attenuated splicing—unlike prolonged treatment—did 
not elicit apoptosis (Extended Data Fig. 5a-d and Methods). We then 
performed strand-specific sequencing of total RNAs isolated from 
chromatin, nucleoplasm and cytoplasm fractions, and calculated rela- 
tive enrichments in each compartment. Among 1,282 IncRNAs with 
detectable expression in mESCs, only 4 were upregulated on chromatin, 
whereas 337 (26.3%; P< 0.05) showed decreased chromatin associations 
yet increased cytoplasm or nucleoplasm signals (Fig. 2a, b, Extended 
Data Figs. 5e, f,6 and Supplementary Table 4). 

To reveal immediate effects in an inducible way, we sought to deplete 
the U1lsnRNP component SNRNP70 by using an auxin-inducible degron 
(AID) system (SNRNP70*”) in mESCs (Methods). Four-hour treatment 
with auxin led toa90% depletion of SNRNP70 protein, which impaired the 
binding of U1 snRNA to its targeted IncRNAs, and attenuated splicing, but 
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(bottom) of IncRNAs (U1 site: n= 3,385; 3’ss:n=7,661) and mRNAs (U1site: 
n=47,298; 3’ss:n=78,227) in mice. Genes with transcript length (for U1 sites) or 
genomic length (for 3’ss) of more than 1 kilobase (kb) were analysed. 

d, Enrichment in chromatin localization (top) and U1 RAP-RNA signals 
(bottom) inmRNAs (n=2,262 genes in each group) and IncRNAs (n=97; 
fragments per kilobase per million mapped reads (FPKM) of more than 1; no 
overlap with mRNA) that are ranked from lowto high levels of chromatin 
association (left to right). For cand d, box plots show Sth, 25th, 50th, 75th and 
95th percentiles, with median values labelled. P-values, two-sided t-tests. 


hada minor effect onthe expression of the IncRNAs analysed, and did not 
elicit apoptosis (Extended Data Fig. 5g-k). Consistently, total RNA-seq of 
auxin-treated SNRNP70*” mESCs revealed 346 chromatin-downregulated 
IncRNAs (27%; P< 0.05), which substantially overlap with those affected 
by U1AMOsand showincreased signals inthe cytoplasm and/or nucleo- 
plasm fractions (Fig. 2, Extended Data Figs. 5f, 1, m, 6 and Supplementary 
Table 5). The sets of chromatin-downregulated IncRNAs following U1 
AMO and/or SNRNP70*” treatment exhibit stronger U1 snRNA binding 
activities and higher expression levels compared with IncRNAs that are not 
downregulated, providing evidence that U1snRNP promotes chromatin 
association of its target IncRNAs (Extended Data Fig. 5n). 

We then investigated whether the U1 mechanism regulates the locali- 
zation of mature transcripts after alncRNA is made. Sequencing of poly- 
adenylated RNA (polyA-seq) revealed that degradation of SNRNP70*”" 
led to roughly 23.6% of IncRNAs (295; P< 0.05) being downregulated on 
chromatin, which correlated significantly (R = 0.49, P=5.0 x 107”) with 
the change in total RNA-seq (Fig. 2a, b, Extended Data Figs. 6, 7a—c and 
Supplementary Table 6). To investigate whether newly synthesized polyA 
RNA shows similar changes, we pulse-labelled SNRNP70“” mESCs with 
4-thiol-uridine (4sU) and performed thiol-linked alkylation for metabolic 
sequencing (SLAM-seq”*). Chemical conversion of the newly incorporated 
4sU into cytidine discriminates new transcripts from pre-existing tran- 
scripts (Extended Data Fig. 7d, e). Among 492 polyA IncRNAs with new 
transcripts detectable by SLAM-seq, 115 (23.4%) were downregulated on 
chromatin in auxin-treated mESCs, while only 3 were upregulated (Fig. 2c 
and Supplementary Table 7). Newtranscripts and all transcripts (new plus 
pre-existing) show highly correlated changes (R= 0.81, P=4.1x 10”) 
upon SNRNP70*” degradation (Extended Data Fig. 7f). Notably, both 
well spliced IncRNAs (for example, Meg3 and Rian) and poorly spliced 
IncRNAs (suchas Lncencl, Tsixand Put1) show decreased chromatin bind- 
ing after U1 inhibition (Extended Data Fig. 6). These findings rule outa 
kinetic effect due to delayed release of nascent or unspliced RNAs from 
their transcription sites that contributes mainly to chromatin retention. 

Chromatin-downregulated IncRNAs with slightly longer transcript 
length exhibit similar distributions of total RNA signals across the gene 
body before and after U1 inhibition, and do not show higher decreases 
in expression compared with IncRNAs that are not downregulated 
(Extended Data Figs. 5n, 7g, h), arguing against globally premature 
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Fig. 2 |U1snRNP regulates IncRNA-chromatin retention. a, Heat maps showing, 
left, the chromatin/non-chromatin ratio (where ‘non-chromatin’ refers to 
cytoplasm plus nucleoplasm) of 337 chromatin-downregulated IncRNAs in total 
RNA-seq upon AMO treatment; and right, the chromatin/non-chromatin ratio of 
346 chromatin-downregulated IncRNAs in total RNA-seq and the chromatin/cell 
ratio (where ‘cell’ refers to the whole cell) in polyA-seq upon SNRNP704” 
degradation. See also Extended Data Fig. 5f. CT, no auxin. b, Agenome-browser view 
of the Lncencl locus; see also Extended Data Fig. 6g. In this panel, mm9 is amouse 


termination”. To further assess potential transcriptional changes, we 
performed transient transcriptome sequencing of nascent transcripts 
(TT-seq) and RNA polymerase II (Pol II) chromatinimmunoprecipitation 
followed by sequencing (ChIP-seq). Acute degradation of SNRNP70*” did 
notalter the overall transcription rate and the distribution of paused and 
elongating Pol Il across the genome (Extended Data Fig. 7i-k). Moreover, 
depletion of EXOSC3—an essential subunit of the RNA exosome-—failed 
torescue the observed decreases of chromatin-bound IncRNAsin SNRN- 
P70*"” mESCs (Extended Data Fig. 71). These combined analyses elucidate 
adirect effect of UIlsnRNP in tethering both nascent and polyadenylated 
IncRNAs to chromatin, excluding the possibility that there are indirect 
consequences due to changes in transcription dynamics, RNA processing 
and decay onIncRNA-chromatin associations under our assay conditions 
of inhibiting U1 snRNP for short periods of time. 

We detected substantial amounts of SNRNP70 and SNRPC onchroma- 
tin, and this association was sensitive to high-salt extraction (Extended 
Data Fig. 8a). Mass-spectrometry analysis of the SNRNP70 complexes 
identified proteins involved in transcription regulation, such as the 
Pol II large subunit POLR2A and elongation factors SPT5 and SPT6, 
besides splicing factors (Extended Data Fig. 8b, cand Supplementary 
Table 8). Using native chromatin extracts that were released by the 
nuclease benzonase—which degrades RNA and DNA—we confirmed 
that SNRNP70 captured phosphorylated Pol II (S2P, phosphorylated 
at serine 2) and SPT6 in wild-type but not auxin-treated SNRNP704” 
mESCs (Extended Data Fig. 8d). Reciprocal co-immunoprecipitation of 
phosphorylated Pol II (SSP and S2P), but not hypophosphorylated Pol 
Il (8WG16), captured SNRNP70 and SNRNPC (Fig. 3a). Thus, U1 snRNP 
binds transcriptionally engaged Pol II on chromatin, ina manner that 
is likely to be independent of RNA and DNA. Treatment of mESCs with 
the transcription inhibitors flavopiridol or triptolide reduced levels of 
chromatin-bound SNRNP7O and SNRNPC proteins and also U1 snRNA 
(Fig. 3b and Extended Data Fig. 8a, e-g), consistent with a previous 
report of disrupted enrichment of U1 snRNA in the gene body of active 
genes by flavopiridol” (Supplementary Discussion 2). Inhibition of 
transcription also led to reduced chromatin associations of analysed 
IncRNAs (Fig. 3b and Extended Data Fig. 8e). These results suggest a 
central role of the active Pol II machinery in promoting the chromatin 
association of U1 snRNP and its target IncRNAs. 
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genome assembly. The top two sets of tracks show the total RNA-seq of whole-cell 
and chromatin fractions after treatments with control (CT), U2or ULAMOs. The 
lower three sets of tracks show total and polyA RNA-seq of whole-cell and chromatin 
fractions and TT-seqin SNRNP70“" mESCs at 0 hor 4h of auxin treatment. 

c, Volcano plots showing fold changes (log,) of the chromatin/non-chromatin ratio 
of 1,282 IncRNAs identified by total RNA-seq or of 492 newly synthesized IncRNAs 
identified by SLAM-seq upon SNRNP70*” depletion. AID, plus auxin. P-values were 
obtained by two-sided t-test for three biological replicates. 


We then investigated whether other spliceosome components might 
contribute to IncRNA-chromatin retention. Treatment of mMESCs with 
the drug E7107, which specifically targets the U2 snRNP”, also attenu- 
ated chromatin associations of analysed IncRNAs (Fig. 3b and Extended 
Data Fig. 8e). By contrast, inhibiting the recruitment of the U4/U6-U5 
tri-snRNPs to the catalytically active spliceosome by using the drug 
isoginkgetin”’ or by depletion of the trissnRNP components PRPF8 or 
SNRNP200 failed to have an effect (Fig. 3b and Extended Data Fig. 8e, h). 
We further inactivated U2 or both U1 and U2 (U1/2) snRNAs with AMOs. 
Inhibition of U2 snRNA alone had a much subtler effect, altering the 
localization of a subset of chromatin-bound IncRNAs, and inhibition of 
both U1/2 snRNAs did not result in a stronger effect than did inhibition 
of U1lsnRNA alone (Fig. 2a, b and Extended Data Figs. 5a-f, 6). We posit 
that precatalytic recognition by U1 snRNP, and toa lesser degree by U2 
snRNP, but not splicing per se primarily controls IncRNA-chromatin 
retention. Notably, for promoter-associated upstream antisense (ua) 
RNAs and enhancer (e)RNAs?>, inhibition of U2 snRNA as well as U1 
snRNP also led to decreased chromatin but increased cytosolic signals 
(Extended Data Fig. 9). 

Lastly, to reveal the biological significance of U1 mechanism, we 
focused on Malat1, which binds to thousands of genomic sites to regu- 
late transcription, pre-mRNA splicing and nuclear architecture’. 
Mature transcripts of Malatl are neither spliced nor polyadenylated, 
providing an ideal example to reveal how U1 snRNP affects its chro- 
matin localization, independently of splicing and PAS-mediated RNA 
decay” 3, U1 snRNA binds extensively to Malat1 RNA, although it is not 
spliced, and inhibition of U1 snRNP led to decreased levels of Malat1 
onchromatin without severely altering its expression (Extended Data 
Figs. 2c, 5b and 6a). We performed Malati chromatin isolation by RNA 
purification and sequencing (ChIRP-seq) and RNA fluorescence in situ 
hybridization (FISH). Rapid degradation of SNRNP70*” caused drasti- 
cally reduced Malat1 binding at active genes across the genome, and 
abolished the punctate, speckle-like staining pattern of Malat1, whereas 
the localization of SC35 protein—a marker of nuclear speckles—was 
not affected (Fig. 3c, d and Extended Data Fig. 10a—e). Moreover, 
treatment with triptolide also abolished Malatl binding to its own 
and other genomic sites (Extended Data Fig. 10d), implying a Pol II 
transcription-dependent retargeting of Malatl to chromatin. 
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Fig. 3 |U1 tethers and mobilizes IncRNAs to chromatin by interacting with 
transcriptionally engaged Pol II.a, Co-immunoprecipitation and western blot 
analysis of chromatin fractions released by benzonase; n=3 independent 
experiments. b, Relative chromatin/non-chromatin ratios of representative 
IncRNAs and U1snRNA in various treatments. DMSO, dimethylsulfoxide 
(vehicle); Flav, flavopiridol; TPL, triptolide. Dataare mean +s.e.m. P-values were 
obtained by two-sided ¢-test for three biological replicates. *P< 0.05, **P< 0.01. 

c, FISH of Malat1 RNA. SNRNP70*"” mESCs (GFP, dashed circles) and wild-type 
mESCs (WT; GFP’, arrows) were mixed and treated with auxin for 4h. A zoomed-in 
view of the boxed region is shown on the right. Hoechst staining indicates DNA. 
n=3 independent experiments; the statistical summary is shown in Extended 
Data Fig. 10c. d, Meta-analysis of Malat] ChIRP-DNA-seq signals across the gene 
body of 10,675 expressed and 7,933 unexpressed genes. TSS, transcriptional 
start site. e, UlsnRNP mobilizes IncRNAs to chromatin during their synthesis and 
function. Top, after an IncRNA (or uaRNA or eRNA) is synthesized, it binds 
persistently to UlsnRNP owing to animbalanced distribution of 5’ and 3’ splice 
sites. Bottom, this provides a way to mobilize IncRNA transcripts to nearby gene 
or regulatory sites within their chromatin neighbourhoods (for many cis- 
regulatory IncRNAs) or to distal chromatin regions (as exemplified by the IncRNA 
Malat1). Retargeting of IncRNAs by U1snRNP to cis and/or trans genomic sites 
may be achieved in part through their interactions with engaged Pol II. 
Consequently, chromatin-bound IncRNAs may function as an RNA glue to hold 
U1snRNP and the Polll machinery at regulatory sites, creating areservoir of 
regulatory factors for feedback regulation of transcription and chromatin state. 
For gel source data, see Supplementary Fig. 1. 


Insummary, unlike mRNAs, IncRNAs are enriched with U1-recogni- 
tion sites but depleted of 3’ splice sites (Fig. 3e). Dynamic interaction 
of U1snRNP with transcriptionally engaged Pol II may provide ameans 
of mobilizing U1 snRNP andi its interacting IncRNA transcripts to cis and 
trans genomic sites for feedback regulation of transcription and chro- 
matin state” °. This U1 model provides a parsimonious mechanism that 
is generally applicable to hundreds of noncoding transcripts, although 
other mechanisms—suchas those involving repetitive sequences °”— 
may also exist to achieve IncRNA association with chromatin. This newly 
identified function of U1 snRNP adds to other findings that suggest a 
role for Ul beyond splicing, suchas facilitating promoter directionality 
through the U1-PAS axis””°, and promoting transcriptomic integrity 
through U1 telescripting”. It is possible that these mechanisms work 
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in concert to ensure the proper expression and function of IncRNAs 
on chromatin (Extended Data Fig. 10f). 
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Methods 


Cell culture, transfection, and treatment 

Mouse mESCs (46C) were maintained in complete ESC culture medium 
under standard ESC culture conditions as described*. HEK 293T 
and HeLa cells were maintained in standard eukaryote cell culture 
medium. Cell lines were not authenticated, and were routinely tested 
for mycoplasma contamination by polymerase chain reaction (PCR). 
Lipofectamine 3000 was used for transfection. The plasmids from the 
REL-seq pools or reporters were co-transfected with plasmids express- 
ing the pBASE transposase. At 12-16 h after transfection, the medium 
was replaced with the appropriate medium containing 250 pg mI 
hygromycin B (Thermo Fisher) for 2 days; the cells were then transferred 
into a10-cm plate for subsequent subcellular fractionation assays. For 
drug treatments, under culture conditions, cells were treated with 
1M E7107 (an inhibitor of U2 snRNP”, from Y. Shi’s laboratory), with 
1M flavopiridol (a transcription inhibitor®; Selleck, catalogue num- 
ber $1230), or with 1 pM triptolide (another transcription inhibitor; 
Abcam, ab120720) for 1h, or with 100 pM isoginkgetin (an inhibitor of 
U4/6-U5 snRNP”; MedChemExpress, HY-N2117) for 2h. 


Construction of the SNRNP70*" cell line 

Toconstruct the auxin-inducible degron (AID)** cell line for SNRNP7O, we 
co-transfected into mESCs an exogenous SNRNP7O0 expression construct 
with its amino terminus fused to 3x FLAG-biotin-AID tags in the Pigg yBac 
vector (PB-3FB-AID-SNRNP7O; hygromycin-resistant), together with 
E3 ligase expression plasmids for the AID system (PB-Tirl; neomycin- 
resistant) and plasmids expressing the transposase pBASE. After selec- 
tion with drugs for five days, endogenous SNRNP70 was deleted through 
the CRISPR/Cas9 system and single clones were picked as described”®. 
Clones with homozygous deletion of endogenous SNRNP70 and rapid 
removal of exogenous AID-tagged SNRNP70 were used for downstream 
experiments. The auxin analogue indole-3-acetic acid (IAA) was used to 
induce the degradation of AID-tagged SNRNP70 protein. 


Subcellular fractionation assay 

Subcellular fractionation was performed as described with the fol- 
lowing modifications®. Cytoplasmic buffer containing 0.05% or 0.15% 
NP-40 was used to isolate the cytoplasmic fraction of mESC or 293T 
cells respectively. The whole of each subcellular fraction was spiked 
and mixed thoroughly with 5 pl of 0.1 ng pl spike-in RNA (containing 
two in vitro transcribed RNAs, lacZ and mCherry; for polyA RNA-seq 
or SLAM-seq, ERCC spike-in (Thermo Fisher) was used). For total RNA- 
seq, ribosomal RNAs were removed using Ribo-Zero rRNA removal 
kits (Epicentre); for polyA RNA-seq, polyA RNAs were isolated using 
Dynabeads mRNA purification kits (Thermo Fisher). RNA-seq libraries 
were constructed using NEBNext Ultra II directional RNA library prep 
kits (NEB). For reverse transcription with quantitative PCR (RT-qPCR) 
or RNA-seq data analysis, the ratio of each fraction versus total input 
was calculated by normalizing to the spiked transcript. For protein 
analysis, the same ratio of lysate from the different fractions versus 
input was used for western blotting. 

We also performed a subcellular fractionation with sequentially 
increasing salt concentrations as described”. Briefly, isolated nuclei 
were washed for 10 min with soluble nuclear buffer (1mM EDTA, 0.2 mM 
EGTA), then washed sequentially with soluble nuclear buffer supple- 
mented with 75 mM, 150 mM, 300 mM or 600 mM NaCl. The pellet 
was then digested with benzonase nuclease, and the supernatant was 
harvested after spinning at 14,000 r.p.m. for 15 min. The supernatant 
of each step was collected, and the same portion of each collected 
sample was used for further western blotting. 


REL-seq and mutREL-seq 
DNA fragments of candidate genes were obtained from PCR of genomic 
DNA (Malati, Neat1, NXF1-IR (where IR is the retained intron 10 of NXF1), 


NR_028425 and the 3’ untranslated repeat (UTR) region of NCL) or com- 
plementary DNA (ACTB, NANOG, 5’-UTR and the coding-sequence 
region of NCL). For Xist, the pCMV-Xist plasmid (which also contains 
a roughly 2-kilobase region of the last exon of Tsix) was ordered from 
Addgene and used for REL-seq*”. DNA samples were sonicated to obtain 
amixture of short DNA fragments of the expected size (Extended Data 
Fig. 1c and Supplementary Table 1). For mutREL-seq, random mutagen- 
esis was achieved through error-prone PCR as described**. 

The short DNA fragments were end-repaired and adenylated using a 
DNA library construction kit according to the manufacturer’s instruc- 
tion (NEB). In-house-designed ‘Y-shaped’ adaptors (Supplementary 
Table 9) were then ligated to the prepared fragments, and the ligation 
products were further size-selected by agarose-gel purification. Note 
that the adaptor can be ligated in both sense and antisense directions. 
The reverse-complement inserts, which were generated as an insertion 
by-product, serve as an internal control for the sense strand. The puri- 
fied products were amplified, digested by Ascl and Notl, and ligated 
with Ascl- and Notl-digested SAI, 3Al (which refer to 3’ or 5’ of the ACTB 
intron) or GFP reporter vectors. The products were further purified by 
ethanol precipitation and the pellet was dissolved in 1 pl of water. See 
Supplementary Note 1 for a description of the three reporter vectors. 
The purified ligation products were transformed into electrocompe- 
tent cells (Takara) through electroporation and plated evenly on two 
15-cm agar plates. After overnight growth, the cells were harvested by 
scraping, and the plasmids were then extracted and co-transfected 
with pBASE into mESCs or 293T cells as described above. These two 
cell lines were chosen as they can be efficiently transfected, and they 
represent pluripotent and fully differentiated cellsin mice and humans, 
respectively. After 2 days of drug selection, cells were plated onto a 
10-cm plate, and subcellular fractionation was performed and RNA was 
extracted from the different fractions using TRIZol reagent. RNAs were 
further treated with DNase I for 20 min to remove residual DNA, and 
reverse-transcribed with SuperScript Ill reverse transcriptase (Thermo 
Fisher). A specific reverse-transcription primer that binds downstream 
of the insertion site was used for reverse transcription (Supplementary 
Table 9). Reverse transcription was performed at 50 °C for 40 min and 
at 55 °C for another 20 min. Ten units of exonuclease I were added and 
incubated at 37 °C for 20 minto remove the free reverse-transcription 
primers. The reaction was stopped by heating the sample at 95 °C for 
10 minto inactivate all of the enzymes. Then, 0.5 pl of 10 mg mI" RNase A 
was added and incubation was carried out at 37 °C for 20 minto degrade 
the RNA. Complementary DNAs were purified through ethanol precipi- 
tation and used as templates for PCR and library construction. 

PCR was performed using primers that bind on either side of the 
inserted fragments. For the SAI or 3AI reporter, one primer specifically 
targeted the exon junction site of the spliced ACTB intron 3, while the 
other primer targeted the adaptor sequence. For the GFP reporter, 
both of the primers were designed to target the adaptor sequences 
(Supplementary Table 9). The PCR products were then purified using 
1x Ampure XP beads, and libraries for the different samples were con- 
structed using NEBNext Ultra II DNA library prep kits (NEB, E7645). 


RNA pull-down assay 

RNA pull-down assays were performed as reported with some 
modifications”. Briefly, biotinylated NXF1-enChr sense or antisense 
(NXF1-enChr-as) RNA was obtained by in vitro transcription with biotin- 
UTP incorporation. We denatured 2 pg of purified biotinylated RNA at 
90°C for 2 min, and then transferred it onto ice for 2 min. We added 
a one-quarter volume of 5x RNA structure buffer (final concentration 
10 mM Tris pH 7, 0.1M KCI, 10 mM MgCl.) and incubated the mixture 
at room temperature for 20 min. The folded RNA was mixed with the 
precleared mESC nuclear extract, and incubated at room temperature 
for 1h. Prewashed streptavidin beads were added and incubated at 
room temperature for another 2 h. The beads were washed five times 
with RIP wash buffer (200 mM KCI, 25 mM Tris pH 7.4), then eluted with 


Article 


nuclear lysis buffer (50 mM Tris-Cl, pH 7.5, 10 mM EDTA, 1% SDS) at room 
temperature for 2 min. A one-fifth volume of 6x SDS sample buffer 
was added and boiled for 5 min at 95 °C, and the protein sample was 
used for western blotting or prepared for mass-spectrometry analysis. 


AMO inhibition 

Treatment with AMOs for U1, U2 or U1/2 snRNA inhibition was per- 
formedas described with modifications*°*°*0*!, Briefly, roughly 2.5 x 10° 
mESCs were nucleofected with various AMOs by electroporation using 
aNucleofector (Amaxa) and then immediately plated ina six-well plate. 
For each treatment with scrambled control (CT), Ul or U2 AMOs, the 
AMO concentration was 7.5 nmol per 100 ul (75 pM) per nucleofec- 
tion. For inhibition of both U1/2, 5 nmol per 100 pl of U1 and U2 AMOs 
(50 uM each) were co-transfected. AMO sequences are listed in Supple- 
mentary Table 9. After 2h of AMO nucleofection, cells were harvested 
for RNA-seq (whole cells) or subcellular fractionation for downstream 
experiments and analyses. Note that the AMO-treated mESCs were 
nicely attached to the culture plate and no obvious morphological 
changes were observed at the end of the 2-htreatment time. However, 
mESCs appear to be more sensitive to U1 and/or U2 AMOs than were cell 
lines used in previous studies, where AMO treatment was performed 
for 8h (refs. 75304), After 4-h treatment with U1, U2 or U1/2 AMOs, 
mESCs started to become detached from the culture plate, indicating 
cell death. Extensive cell death was observed at 8h of AMO nucleofec- 
tion. Therefore, we chose to analyse the immediate effects of U1 and/ 
or Ulinhibition on RNA subcellular localization after 2 h of treatment 
instead of 8 h. Short-term treatments appear to be suitable for study- 
ing IncRNA and ncRNA transcripts with relatively short half-lives”. 


Co-immunoprecipitation 

Co-immunoprecipitation was performed as described with some 
modifications’. mESC nuclei were isolated using hypotonic buffer 
and washed with benzonase digestion buffer (SO mM HEPES, pH 7.5, 
200 mM NaCl, 5mM MgCl, 0.1mM Na,VO,, 0.1% NP-40, 10% glycerol) 
at 4 °C for 30 min. The pellet was resuspended with benzonase diges- 
tion buffer containing 1 pl benzonase nuclease (Sigma, E1014) and 
rotated at 4 °C for 30 min to isolate the chromatin extract. For antibody 
immunoprecipitation, we added 2 pg of antibody to each sample and 
used protein G magnetic beads (Thermo Fisher) for the pull-down. 
FLAG M2 magnetic beads (Sigma) were used for FLAG immunopre- 
cipitation. 

For co-immunoprecipitation coupled with stable-isotope labelling by 
amino acids in cell culture followed by mass spectrometry (coIP-SILAC), 
wild-type mESCs were cultured with heavy SILAC media and mixed with 
equal amounts of SNRNP70*” mESCs (FLAG-tagged) cultured in light 
SILAC media. The co-immunoprecipitation procedure was as described 
above, except that pull-down was performed with FLAG beads and the 
proteins were eluted with FLAG peptide. The eluate was concentrated 
through acetone precipitation. Protein identity and relative enrichment 
were determined by mass spectrometry. Proteins with scores larger 
than 20 and light/heavy ratios larger than 2 were defined as SNRNP70- 
interacting proteins (Supplementary Table 8). 


SLAM-seq of subcellular fractionation samples 

The SLAM-seq protocol was modified froma previous report™. Briefly, 
SNRNP7O0*"” mESCs were treated with auxin for 2 h, then allowed to 
incorporate 4sU (300 mM final) for 3 h in the presence of auxin. The 
purpose of this 3-h 4sU labelling is to make sure that most of the new 
transcripts are fully transcribed, processed and localized. SNRNP70*” 
mESCs treated with 4sU but without auxin were used as controls. Sub- 
cellular fractionation was performed as described above and RNAs 
from each fraction were extracted. The RNAs were further treated with 
chemical reactions as described“ and 3’-end mRNA-seq libraries were 
constructed using acommercially available kit (QuantSeq 3’MRNA-Seq 
library prep kit, Lexogen). Deep sequencing was performed and data 
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were analysed using SLAM-DUNK*, with the last exon of IncRNASas the 
reference. The remaining parameters were left as defaults. 


TT-seq 

The TT-seq protocol was modified from a previous report**. Auxin- 
treated or untreated SNRNP70*” mESCs were allowed to incorporate 
4sU (500 mM final) for 10 min; cells were then harvested and RNAs 
were extracted by TRIzol. Labelled RNAs were further biotinylated as 
described” and fragmented with 0.1x NaOH onice for 10 min. The frag- 
mented RNAs were subjected to purification as described*”. RNA-seq 
library construction, deep sequencing and data analysis were further 
performed as described above and below. 


RNA FISH 

Malat1 single-molecule inexpensive FISH (smiRNA FISH) was performed 
as described*®. Immunofluorescence was carried out according to the 
protocol provided by Cell Signaling Technology. To reduce variations 
arising from experimental or imaging procedures, we constructed a 
SNRNP70*"" mESC line with stable integration of a GFP gene (GFP*). 
SNRNP704"°-GFP* mESCs were mixed with wild-type GFP-negative 
mESCs (GFP ) before plating for RNA FISH or immunofluorescence. 
The probes used for Malatl FISH are listed in Supplementary Table 9. 


ChIRP-seq 

Auxin-treated or untreated SNRNP70*” cells were crosslinked using 
2mM dithiobis succinimidyl propionate (DSP) for 30 min, followed by 
15 min of crosslinking with 3.7% formaldehyde. The crosslinked cells 
were first partially digested with 12 U mI“ DNase | at 37 °C for 10 min, 
then sonicated at 25% amplitude for 30 s. Subsequent procedures were 
performedas described’°”’. The probes used for Malati ChIRP are listed 
inSupplementary Table 9. For ChIRP RNA, auxin-treated or untreated 
SNRNP704" mESCs were crosslinked with 2% formaldehyde for 10 min. 
Subsequent procedures were performed as described”. 


REL-seq data analysis 

Sequences corresponding to vectors and adaptors were removed from 
the REL-seq sequencing data using BEDTools*°. The remaining ‘clean’ 
reads were mapped to the mouse genome (mm49) through TopHat™. To 
compare read densities, we divided the genomic regions of candidate 
genes into multiple bins, each ten base pairs in length. For each sam- 
ple, the number of reads located in each bin was counted and further 
normalized by the total reads mapped to those bins. The fold changes 
were calculated by dividing the read density of the chromatin fraction 
by the read density of the cytoplasm fraction. To identify the chroma- 
tin-enriched regions, we calculated the fold changes of the different 
samples, and carried out a t-test to compare the fold change of each 
sample with the median fold change of the corresponding sample. Only 
a bin with a ¢-test P-value of less than 0.05 and a fold change greater 
than 1.5 was recognized as a bin with significant chromatin enrich- 
ment. Neighbouring bins, within a distance of 50 nucleotides, were 
merged as a region with significant chromatin enrichment. As most 
of the inserts were much larger than 50 nucleotides, merged inserts 
with a length of less than 50 nucleotides were discarded. Because we 
used very stringent selection criteria across all REL-seq data sets gener- 
ated with different libraries in three Al (AC7B intron) and GFP reporter 
vectors in both human and mouse cell lines, the enChrs we identified 
may underrepresent the total number of regulatory RNA signals for 
chromatin association in the cell. 

To identify inserts that were enriched in different fractions, we 
reconstructed inserts by identifying their ends through the mapping 
of paired-end reads. The abundance of different inserts in the differ- 
ent fractions was counted, and fold changes of different clones were 
calculated by dividing the insert abundance in the chromatin fraction 
by the insert abundance inthe cytoplasm fraction. Only inserts witha 
maximum read number larger thanten and a minimum no less than one 


were used for the analysis. The cutoff of fold changes for chromatin- 
enriched inserts is annotated in the relevant figure legends. 

For NXF1-enChr mutREL-seq, ‘clean’ reads were mapped through 
Novoalign onto a reference sequence, built in-house, which contained 
only the sequence of the NXF1-enChr region. The different mutations 
at each position were counted. The cytoplasmic/chromatin ratio 
of each mutation was calculated by dividing the normalized reads 
(normalized by wild-type reads) in the cytoplasmic fraction by that in 
the chromatin fraction. 


RNA-seq data analysis 

RNA-seq data from fractionation of human K562 cells were down- 
loaded from the ENCODE project (https://www.encodeproject.org)?. 
Alignments of RNA-seq data to human genome assembly hg19 were 
performed using Tophat v2.0.10 (ref. *). Fragments per kilobase of 
exon model per million mapped reads (FPKM) were calculated using 
Cufflink 2.1.1to represent expression levels of transcripts. Gencode v19 
was used as the human gene annotation. Similarly, RNA-seq data from 
subcellular fractionation of mouse cells were mapped to mouse genome 
assembly mm10 (for comparisons and correlation analyses) or mm9 
(for tracks showing sequencing signals), and the corresponding FPKM 
were calculated with the Gencode vM9 annotation. For data analysis of 
strand-specific RNA-seq data, the strand information of each mapped 
read was first converted into absolute strand information relative to the 
genome; then the FPKM was calculated through Cufflinks 2.1.1 using 
stranded RNA-seq parameters. 

For RNA-seq analysis of whole-cell and subcellular fractionated sam- 
ples, raw reads of mouse subcellular fractionation RNA-seq data were 
mapped to mouse genome assembly mm9 for sequencing signal analy- 
sis tracks or mm10 for FPKM calculations. For the analysis of relative 
gene abundance in different subcellular fractions, we constructed an 
in-house reference file by combing the genome assembly mm10 with 
our in-house spike-in sequence (lacZand mCherry). The RNA-seq data 
of each fraction were mapped to the in-house reference. The FPKM 
value of each gene was calculated (with Gencode vM9 annotation), and 
the ratio of reads that mapped to spike-in was also calculated ((reads 
mapped to spike-in)/(total mappable reads)). The FPKM value of each 
gene was further normalized by the corresponding ratio of spike-in, 
and the ratio of each fraction was calculated by dividing the normalized 
FPKM value of each fraction by the sum of all three fractions (cytosol 
plus nucleus plus chromatin). The sequencing signal track of subcel- 
lular fractionation samples was also normalized by the ratio of spike-in. 
LncRNAs witha minimum FPKM of more than 0 anda maximum FPKM 
inallsamples of more than 1 were chosen as mESC-expressed IncRNAs. 
To identify Ul-snRNP-regulated IncRNAs, we first normalized the RNA 
signals in each compartment to spike-in RNA controls and then calcu- 
lated relative enrichments on chromatin (chromatin/non-chromatin 
ratio) by comparing normalized RNA signals in the chromatin to the 
non-chromatin (cytoplasm plus nucleoplasm) fractions. Only those 
IncRNAs that showed greater decreases in the chromatin fraction 
than in the nucleoplasm and cytoplasm fractions (a chromatin/non- 
chromatin ratio of less than one; P< 0.05) upon U1 snRNP depletion 
were Selected as Ul-snRNP-regulated IncRNAs. In addition, we also 
compared IncRNA abundance in each of the chromatin, cytoplasm and 
nucleoplasm fractions to the total amount of IncRNA in all fractions, 
depicted as ‘chromatin/total’, ‘cytoplasm/total’ and ‘nucleus/total’. 

We used the following published data sets in our analysis: total RNA- 
seq of whole cells and the chromatin fraction of K562 cells, GSE30567; 
Ul and Malat1 RAP in mESCs”, GSE55914; Pol II NTD ChIP-exo-seq in 
mESCs», GSE64825; Pol Il SWG16 ChIP-seq in mESCs”°, GSE49847; Ser 2 
and Ser 5 phosphorylated Pol II ChIP-seq in mESCs*”, GSE112114. 


Correlation analysis 
We used two methods-— bioinformatics prediction and U1 RAP-RNA—to 
analyse the U1-recognition site. Prediction of U1 snRNP recognition 


sites was performed as described*’. On the basis of their maximum 
entropy scores, we categorized the predicted U1 snRNP recognition 
sites into predicted strong (top 50% of sites) and medium (bottom 25% 
to 50% of sites). We calculated the number of U1-recognition sites in 
the intron and exon regions for each transcript. The average density 
of predicted Ul1-recognition sites was calculated by dividing the total 
number of predicted U1-recognition sites in all exons (or all introns, 
and so on) by the total length of all exons (or all introns and so on). 
Only transcripts with a total exon length (or intron length) greater 
than 1 kilobase were chosen for further comparison of the density of 
predicted U1-recognition sites in IncRNAs and mRNAs. 

Only genes with FPKM values of more than 1 were used to compare 
mRNAs with different chromatin-association tendencies and IncRNAs. 
As RNA-seq data of mouse subcellular fractions are unstranded, IncRNA 
transcripts that overlapped with coding genes were discarded. Tran- 
scripts that are entirely located in repeat elements were also discarded. 
The fold enrichment of each transcript was calculated by dividing the 
FPKM in chromatin RNA-seq data by the FPKM of the respective total 
RNA-seq data. For Extended Data Fig. 4e, we used FPKM values of tran- 
scripts (isoforms) rather than genes for analysis, as different isoforms 
of the same gene may show a different chromatin-association tendency. 

For U1RAP-RNA analysis, we obtained U1 RAP-RNA data from ref.”, 
which used formaldehyde crosslinking to obtain the U1 snRNA interact- 
ing RNAs. We first calculated the number of reads in the intron and exon 
region for each transcript. Only those transcripts containing at least ten 
reads in each sample were kept for further analysis. To calculate the U1 
enrichment for the intron and exon region of atarget RNA, we divided 
the reads mapping to the target exon or intron of each transcript in 
RAP-RNA by the respective input with normalization of sequencing 
depth. The fold enrichment of the U1 RAP-RNA signal in IncRNAs and 
coding genes was calculated and compared. The t-test (two-sided) was 
used to calculate the significance of the difference between two groups. 


Genome-wide prediction of 3’ splice sites 

To identify the genome-wide distribution of predicted 3’ss, we gener- 
ated a fasta file containing all 4,096 possible arrangements of eight- 
nucleotide motifs ending with ‘AG’. Those motifs were mapped to the 
mouse or human genome through bowtie-1.0.0 to obtain the genomic 
distribution of each eight-nucleotide motif. Each mapped coordinate 
plus 12-nucleotide upstream and 3-nucleotide downstream (23 nucleo- 
tides in total) was used to calculate the 3’ss score (by MaxEntScan*’). 
Each coordinate with a score larger than 8.0 was recognized as a 
predicted 3’ splice site (the median scores of Gencode vM9 annotated 
3’ splice sites were 8.7 in coding genes and 8.0 inIncRNAs). 


Other bioinformatics analyses 

The secondary structures of the wild-type and mutated NXF1-enChr 
regions were predicted from the Vienna RNA website with default 
parameters**°°. The analysed REL-seq tracks and other tracks are shown 
in Integrative Genomics Viewer (IGV)*. Metaplots were drawn using 
ngs.plot™. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


All sequencing data are available in the Sequence Read Archive database 
under accession numbers SRP214639 and SRP125289. For gel source 
data, see Supplementary Fig. 1. Source data for Fig. 2a—c are provided 
with the paper (Supplementary Tables 4-7). All other data are available 
from the corresponding author upon reasonable request. Sequenc- 
ing data have been deposited in the Gene Expression Omnibus under 
accession numbers GSE107131 and GSE134287. 
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Extended Data Fig. 1| REL-seq and mutREL-seq for identification of 

cis elements that contribute to the subcellular localization of RNA. 

a, Comparisons of chromatin enrichment between IncRNAs and protein- 
coding mRNA genes in human and mouse. We fractionated chromatin-bound 
RNAinmESCs and compared the RNA-seq profiles of chromatin RNA to total 
RNA from whole cells. We also analysed a previously published RNA-seq data 
set (chromatin and cell) from human K562 cells**. LacRNAs with FPKM values of 
greater than 1and protein-coding genes with FPKM values of greater than5 
were used for the analysis. Data for samples connected by brackets were 
compared with two-sided t-tests. Consistent with previous reports°®, IncRNAS 
asaclass are significantly enriched in the chromatin fraction in both mouse 
and human cells. Box plots show Sth, 25th, 50th, 75th and 95th percentiles, with 
median values labelled beside the box plots and sample sizes (n> 1,114) labelled 
onthexaxis. b, Detailed pipelines for REL-seq and mutREL-seq. For REL-seq, 
DNA fragments from candidate genes are randomly fragmented, ligated with 
adaptors, amplified, and inserted into three types of reporters in the PiggyBac 
transposon vector (including 5AI, 3Al and GFP reporters). The reporters are 
co-transfected with PiggyBac transposases (pBASE) and stably integrated into 
the genome. RNA from the cytoplasm (cyto) and chromatin (chr) fractions is 
reverse-transcribed (RT) with primers indicated by purple arrows, and 
amplified with primers P1 and P2 for subsequent high-throughput sequencing. 
Sequences that are enriched in different subcellular fractions are identified by 
comparing the read intensities or insert abundance in the chromatin fraction 
with that of the cytoplasmic fraction. For mutREL-seq, a candidate fragment 
(NXF1-enChr) is randomly mutagenized through error-prone PCR, and the 
productsare further inserted into the SAI reporter vector and subjected to 
downstream procedures similar to those described for REL-seq. Asterisks 
represent mutation sites. ITR, inverted terminal repeat sequences of the 
PiggyBac transposon system. See also Supplementary Note 1.c, Box plots 
showing the length of inserts of RTM (RNA transcript mixture, without Xist) 
and Xist REL-seq libraries in 5AI, 3Al and GFP reporters (i) and SAI-short 
reporters (ii). Box plots show Sth, 25th, 5Oth, 75th and 95th percentiles. 


Ini,n=20,000 randomly selected inserts for each group. Inii,n=19,748 for SAI- 
short-1; n=15,091 for SAI-short-2. d, Western blot analysis of marker proteinsin 
subcellular fractions of mESCs or HEK 293T cells. Tubulin and histone H3 are 
used as marker proteins for the cytoplasmic and chromatin fractions, 
respectively. n=3 independent experiments. e, f, RT-qPCR analysis of the 
relative abundance of marker genes and candidate genes for REL-seq in 
subcellular fractions of mESCs (e) or human HEK 293T cells (f). ACTB and Xist 
are used as markers of mouse cytoplasmic and chromatin fractions, 
respectively; AACTB and hACTB-intron (intronic region of hACTB) are used as 
markers of humancytoplasmic and chromatin fractions, respectively. 

g, Numbers of mutation or deletion events expected and identified by mutREL- 
seq. A total of 469 mutation events, including 374 mutations (coverage 99.7%) 
and 85 deletions (coverage 68%), were identified over the 125-base-pair (bp) 
length of the NXF1-enChr DNA, indicating saturated mutagenesis. h, Analysis of 
the mutation rate. PCR mutations are spread across the NXF1-enChr sequence, 
ruling out a PCR bias towards the core 7-nucleotide mutations at positions 37-45. 
Notably, the two binding sites for PCR primers at the 5’ (1-18-bp) and 3’ 
(144-162-bp) ends of NXF1-enChr were less likely to be mutated, compared to 
the middle region with an average of 0.2%-3% mutation rate at each nucleotide 
position. This excludes the possibility that the sequences were misread. 

i, Predicted secondary structure of NXFI-enChr RNA. The U1-recognition site at 
positions 37-45 is highlighted with a thick blue line. A weak U1-recognition site 
at positions 53-59 is highlighted with a thin blue line. The coloured bar 
represents the probability of base-pairing or being unpaired (red, high 
probability of pairing (or lack of pairing); blue, low probability).j, Comparison 
of predicted secondary structures of the wild-type (i) and mutant (ii, iii) NXFI- 
sites. See also Supplementary Note 2. k, The percentage of enChrs or all target 
sequences (‘total’ = (sum of length)/(median length of enChr)) used for REL-seq 
analysis that overlap with predicted Ul1-recognition sites. P-values (one-sided 
Fisher’s exact test) and sample sizes are shown at the top. 
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Extended Data Fig. 2| Genome-browser views of representative genes 
showing REL-seq results. The enChrs (shownas thick black bars, also 
highlighted by dashed boxes) indicate regions with significant chromatin 
enrichment that were identified by Al/GFP or Al reporter screens (P< 0.05, fold 
change greater than1.5). Representative chromatin-enriched inserts from 
respective reporters (SAI-short or GFP) are shown at the bottom of each panel. 
a, b, NoenChrs were identified by REL-seqin the cytoplasm-localized protein- 
coding ACTB (a) and NANOG (b) transcripts. In these transcripts, Ulsignals are 
mainly confined to the intronic regions and appear to be depleted in exons. 
Scales are shown in square brackets at the right of each track. c-g, REL-seq 
identified multiple enChrs in mouse Malatl (c), NR_O28425 (d), Neatl (e), Xist (f), 
and NXF1-IR (g) transcripts. Inc, two mouse regions homologous to regions E 
and Min human Malat]1 are shown with thick blue lines. Inf, only minus-strand 
tracks are shown. The plus-strand tracks for Tsix, which is transcribed inthe 
antisense direction to Xist RNA, are shown in Extended Data Fig. 6c. The 
locations of different repeats in Xist are shownas thick blue lines. Ing, multiple 


strong U1-recognition sites are clustered in the 162-nucleotide NXFI-enChr 
(highlighted by dashed boxes) and nearby sequences. U1lsnRNP also binds 
strongly (roughly 70-fold maximal enrichment versus the input) to this region, 
with peak signals that are centred at the 7-nucleotide U1 motif revealed by 
mutREL-seq (Fig. 1a), providing evidence for direct interactions of NXF1-enChr 
RNA and U1IsnRNP in vivo. The predicted strong or medium U1-recognition 
sites are shown underneath the NXF-] gene annotation. RNA-seq signals of U1 
RAP-RNA with formaldehyde (FA) or 4’-aminomethyltrioxalen (AMT) 
crosslinking and respective input controls are also shown. AMT generates 
interstrand crosslinks between uridine bases to detect the direct RNA-RNA 
interactions of highly expressed transcripts”. FA stabilizes both direct and 
indirect interactions of proteins and nucleotides. Also shown is the average 
fold change (log,) of the read intensity of the chromatin fraction compared 
with that of the cytoplasmic fraction in inserts from AI/GFP reporters or Al 
reporters. Red lines represent chromatin enrichment, while green lines 
represent chromatin depletion, of signals. 
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Extended Data Fig. 3| Reporter assays reveal a key role of the U1 motifand U1 
snRNPinRNA-chromatin association. a, Diagrams showing the GFPreporters 
that we constructed and analysed here. EGFP, enhanced GFP; EV, empty vector; 
PAS, bovine growth hormone (BGH) polyadenylation signal; 3’ Malat1, 3’ 
termination sequence of Malatl; U1 (28-nt), ashort 28-nucleotide NXF1-enChr 
sequence that encompasses two U1-recognition motifs (one strong and one 
weak site from position 32 to 59 in NXF1-enChr). The NXF1-enChr GFP reporters 
(PAS and 3’ Malat1) contain the 162-nucleotide NXF1-enChr sequences with 
either a wild-type Ul-recognition site or a mutated site (mutUL, vertical black 
line). To rule out an indirect effect of RNA degradation on its chromatin 
localization, we replaced the PAS with the 3’ termination sequence of Malatl. 
The 3’ end of Malatl possesses atriple-helix structure, which resembles the 
viral expression and nuclear retention element (ENE) and stabilizes the Malat1 
transcripts”'*. To assess the specificity of Ul-mediated chromatin association, 
we also constructed GFP reporters carrying RepA alone and RepA with 
NXF1-enChr. We used two primer pairs (q1F/q1R and t1F/t1R; red arrows) to 
analyse the expression and potential splicing of the insert. b, The insertion of 
NXF1-enChr (U1) did not elicit splicing. The agarose gel picture shows PCR 
products amplified by primers t1F and t1Rin mESCs expressing the NXF1-enChr 
(PAS) or NXF1-enChr-mutUl1 (PAS) constructs. cDNAs were used as templates 
and the corresponding plasmids were used as control templates. n=2 
independent experiments. c, Subcellular fractionation and RT-qPCR analysis 
of GFPRNA in various reporters. Endogenous expression of ACTB and Malat1 
serves as internal controls. d, GFP fluorescence imaging of mESCs expressing 
the EV, NXF1-enChr (PAS) or NXF1-enChr-mutU1 (PAS) constructs shownina. 
GFP fluorescence is much weaker in cells expressing the NXFI-enChr-PAS 
construct. n=4 independent experiments. e, RT-qPCR analysis of chromatin/ 
cytosol ratios (top) and relative expression (bottom) of GFPRNAin mESCs. The 
relative chromatin/cytosol ratio was normalized toa spike-in RNA that was 
prepared by in vitro transcription. Data shown as mean +s.e.m.;n=2 biological 
replicates. f, RT-qPCR analysis of the chromatin/non-chromatin ratio (i) and 
relative expression (ii) of GFPRNA in human HEK 293T and Hela cells. Similar 
results to those shown ine were observed in human cells, suggesting a 
conserved U1-based mechanism in humans and mice. g, Additive effect of 


multiple enChrs in promoting RNA-chromatin association. The RepA-NXFI- 
enChr reporter exhibited significantly higher chromatin enrichment than 
RepA or NXF1-enChr alone. Inf, g, dataare shown as mean + s.e.m.; P-values 
obtained by two-sided t-test with three biological replicates. h, Representative 
proteins identified by NXF1-enChr RNA pull-down assay. The mass- 
spectrometry scores of proteins identified by NXF1-enChr and NXFI-enChr- 

as RNA pull-down are shown, together with their fold enrichmentin the 
NXF1-enChr sample relative to the antisense control (enChr-as). Sm proteins are 
general components of snRNPs that bind snRNAs. i, Western blot confirming 
the specific interaction between SNRNP70 and NXF1-enChr. Controls were the 
antisense sequence and the sequence witha deletion of the strong U1- 
recognition site (AU1).n=3 independent experiments. j, RNAiin mESCs 
harbouring the NXFI-enChr (PAS) GFP reporter. Depletion, using short hairpin 
RNA (shRNA) knockdown (KD) of the three core components of UlsnRNP 
(SNRNP7O, SNRPA and SNRPC), but not of SPENor splicing regulators (RBM6, 
U2AF1/2 and SF3A2), led to 2.5-4.5-fold increases in GFP signals analysed by 
FACS. k, Knockdown efficiency of SVRNP70 using two shRNAs in mESCs. Top, 
RT-qPCR, mean +s.e.m.; bottom, western blot. n=2 independent experiments. 
I,m, RT-qPCR analysis of the chromatin/cytosol ratio (top) and relative 
expression (bottom panel) of GFPRNA in mESCs after knocking down SVRNP7O 
(I) or SPEN (m) in mESCs expressing the NXFI-enChr-U1 reporter or RepA-GFP 
reporter. Means +s.e.m. are shown. n=2 biological replicates. Scr, scrambled 
control shRNA. n, Mutation analysis of the 5’ and 3’ splice sites using a GFP 
reporter carrying the intron-3 sequence of ACTB. i, The mutation scheme. 

ii, RT-qPCR analysis of relative chromatin/non-chromatin ratios (left) and 
expression (right) of GFPRNA in mESCs expressing different constructs. 

iii, PCR bands of spliced and unspliced transcripts. Splicing was abolished in 
the 3’ss and 5’+3’ss mutants. Splicing was detected inthe 5’ss mutant reporter 
(red asterisk) owing to the presence of an alternative 5’ss downstream of the 
mutated site. iv, Results of sequencing PCR fragments of the S’ss mutant 
reporter. Data are shownas mean ¢+s.e.m., and include three biological 
replicates. P-values obtained from two-sided t-test. See also Supplementary 
Notes 3 and 4. For gel source data, see Supplementary Fig. 1. 
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Extended Data Fig. 4 | Differential distributions of U1-recognition sites and 
3’ splice sites in mRNA and IncRNA genes. a, Average density of predicted 
strong U1-recognition sites in genic regions (i) and repeat elements (ii), and 
average density of predicted 3’ss in genic regions (iii), in humans (top panels) 
and mice (bottom panels). Random sequences, random intergenic sequences, 
and reverse-complementary sequences serve as controls for the background. 
LINE and SINE, long and short interspersed nuclear elements; LTR, long 
terminal repeat. b, Comparison of the density of predicted strong U1- 
recognition sites inthe reverse-complementary strand of exons (top) and of 
3’ss in the gene-body region (bottom) of IncRNAs (Ul1site, n=4,731;3’ss, 
n=21,512) and mRNA genes (UI site, n= 68,881; 3’ss, n=139,458) in humans. 
Only genes with transcript lengths (for U1 site) or genomic lengths (for 3’ss) 
larger than 1kilobase were analysed. c, Densities of predicted medium- 
strength U1-recognition sites in exons of IncRNAs and mRNA genes in humans 
(top; IncRNA, n=4,731; MRNA, n= 68,881) and mice (bottom; IncRNA, n=3,385; 


mRNA, n= 47,298). Transcripts with total exon lengths of more than1,000 
nucleotides were analysed. d, Densities of predicted strong U1-recognition 
sites in introns of ncRNA and mRNA genes in humans (top; IncRNA, n=15,885; 
mRNA, n=130,431) and mice (bottom; IncRNA, n= 6,467; MRNA, n=72,688). 
e, Densities of predicted strong U1-recognition sites in the sense strand (top) 
and reverse-complementary strand (bottom) of mRNA with different levels of 
chromatin-binding activity (n =7,773 each) and IncRNA (n=1,038) transcripts 
(from low to high: green, low; orange, moderate; red, high; dark red, IncRNA). 
f, Comparison of expression (top) and fold enrichment of U1 RAP-RNA signals 
in introns (bottom) inmRNA genes (n=2,262 each group) and IncRNAs (n=97) 
with different levels of chromatin binding. LncRNAs that show detectable 
expression (FPKM values greater than 1) and no overlap with protein-coding 
genes were used. In b-f, P-values are based on two-sided t-tests. Box plots show 
5th, 25th, 50th, 75th and 95th percentiles, with median values labelled by the 
box plots. 
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Extended Data Fig. 5 | See next page for caption. 


Extended Data Fig. 5| UlsnRNP regulates the chromatin retention of 
IncRNAs. a, Scheme illustrating inhibition of Uland/or U2 snRNA by AMO 
nucleofection followed by strand-specific RNA-seq analysis of total RNAs 
isolated from whole cells and from three subcellular fractions: cytoplasm 
(cyto), nucleoplasm (nuc) and chromatin (chr). CT, scramble control AMO. 

b, RT-qPCR of the chromatin/non-chromatin ratios (i) and relative expression 
(ii) of representative IncRNAs in mESCs after 2-h treatments with the control 
(CT), U1, U2 or U1/2 AMOs (upper panels), or after 72 h of knockdown with 
scramble (Scr), SVRNP7O or SPENshRNAs (bottom panels). (Non-chromatin 
refers to cytoplasm plus nucleoplasm.) Means +s.e.m. are shown. n=2 
biological replicates. Consistent with inhibition of UlsnRNA, knockdown of 
SNRNP7O-a core component of U1snRNP—also led to decreased chromatin 
signals of individual IncRNAs. c, Venn diagram showing IncRNAs with 
decreased chromatin associations upon U1, U2 or U1/2 inhibition. Compared 
with U1AMO (337), inhibition of U2 snRNA affected asmaller number (91) of 
IncRNAs, most of which (69) belong to U1-regulated IncRNAs. Inhibition of both 
U1/2 snRNA did not elicit a stronger effect than inhibition of Ulalone. Note that 
inhibition of both Uland U2 used alower concentration of Uland U2 AMOs 

(50 1M each) than the 75 pM used to inhibit U1 alone. d, Metaplots of whole-cell 
RNA-seq reads inintron—exonjunctions upon inhibition of Ul and/or U2snRNA 
(i) or upon SNRNP704” degradation (ii). RPM, read count per million mapped 
reads. e, Volcano plot showing the fold change (log,) of the chromatin/non- 
chromatin ratio of IncRNAs (n=1,282) upon inhibition of UlsnRNA (using 
AMOs, 2h). P-values obtained by two-sided t-test with three biological 
replicates. f, Heat maps showing the ratio of chromatin/non-chromatin, the 
ratio of each fraction versus the normalized total RNA contents, andthe 
expression level of 337 IncRNAs with decreased chromatin association upon U1 
AMO inhibition (i); or 346 chromatin-downregulated IncRNAs in total RNA-seq 
upon SNRNP704” degradation (ii). Upon U1 AMO inhibition or SNRNP704 
degradation, these IncRNAs show decreased chromatin association, while 
their relative abundance in cytoplasmic and nucleoplasm fractions even 
increased. g, Diagram showing the construction of the SNRNP704” mESC line, 
which expresses an AID- and FLAG-tagged SNRNP70 (SNRNP70*") ina 


transgene with the two endogenous SNRNP7O alleles inactivated. T/R1, DNA 
expressing E3 ligase for the AID system. h, i, Time-course expression analysis 
by western blot (h) and by RT-qPCR (i) in SNRNP70*" mESCs upon addition of 
auxin for 0-12 h.n=2 independent experiments. For paneli, means+s.e.m. are 
shown. RNA expression was normalized by a /acZspike-in that was added into 
the same numbers of cells. At 4h of auxin treatment, the expression of p53 
protein andthe RNAtranscripts analysed exhibited modest changes compared 
to the changes after 8-12 h.j, Auxin-induced rapid degradation of SNRNP704 
did not affect the phosphorylation of Pol Il at Ser 2 (S2P). ACTB serves asa 
loading control. n=3 independent experiments. k, RT-qPCR analysis showing 
enrichments of candidate IncRNAs captured by U1 ChIRP before (0 h) and after 
(4h) SNRNP704" depletion. U1 binding to its target candidate IncRNAs was 
severely impaired in auxin-treated mESCs (4h), indicating that the function of 
UlsnRNA requires an intact snRNP complex. Datashownas mean +s.e.m., from 
three biological replicates. P-values obtained by two-sided t-test. I, The overlap 
of a total of 531 U1-snRNP-regulated, chromatin-downregulated IncRNAs from 
U1 AMO inhibition (red) or SNRNP70*" degradation (blue). P-values obtained 
by exact hypergeometric probability. m, Heat map showing changes of the 
chromatin/non-chromatin ratio upon U1 AMO inhibition (i) or degradation of 
SNRNP70*" (ii) for the set of 531 U1-snRNP-regulated, chromatin- 
downregulated IncRNAs shown inl. The patterns of chromatin/non-chromatin 
ratio changes are highly similar for both treatments. n, Analysis of ULRAP-RNA 
signals (i), expression levels (FPKM) (ii) and transcript lengths (iii-v) inthe 
various sets of IncRNAs shown inI. Box plots show Sth, 25th, 50th, 75th and 95th 
percentiles, with median values labelled by the box plots and sample sizes 

(n> 337) labelled onthe xaxis. We further divided mESC-expressed IncRNAs 
into three groups on the basis of their transcript length (from shortest to 
longest). iv, v, The chromatin/non-chromatin ratio of total RNA-seq (iv) and the 
numbers of U1-snRNP-regulated and unregulated IncRNAs (v) ineach group 
(n=427). Longer IncRNAs appear to exhibit stronger chromatin retention and 
to be preferentially affected upon SNRNP70*” degradation. P-values obtained 
by two-sided Mann-Whitney test for i-iv, one-sided Fisher’s exact test for v. For 
gel source data, see Supplementary Fig. 1. 
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Extended Data Fig. 6 | See next page for caption. 


Extended Data Fig. 6 | Sequencing tracks of representative IncRNA and 
mRNA genes. a, Malat1.b, Kcnqloti.c, Tsix.d, Put and its upstream protein- 
coding gene, MYC. e, Meg3. f, Rian. g, Lncencl. Inthe Tsix locus (c), the top three 
sets of tracks show the predicted U1-recognition sites (strong or medium, 
indicated by blue vertical lines), U1 RAP-RNA-seq with formaldehyde (FA) 
crosslinking (including enrichment ratio, reads signals of ULRAP and the input 
control), and the REL-seq result. The lower two sets of tracks show total RNA- 
seq after AMO treatments, and total and polyA RNA-seq and TT-seq of 
SNRNP70*" mESCs at Oh or 4h of auxin treatment. Total RNA-seq analysis 
revealed decreased chromatin levels of Tsix transcripts after inhibition of U1 
snRNA or degradation of SNRNP70°"”. Intriguingly, polyA-seq showed a more 
dramatic increase of polyadenylated 7six transcripts in the whole-cell sample 
compared with that in the chromatin fraction upon SNRNP70*” degradation. 
Coincidently, an Ul-associated enChr (black vertical line) was identified by 
REL-seq at roughly 2.2 kb upstream of the annotated transcription end site 


(TES) of Tsix. In addition, strong binding of U1snRNA was detected extensively 
across the whole exon at the 3’ end of 7six. Thus, UlsnRNP may inhibit the 

PAS of Tsixto promote its degradation and chromatin retention. Inhibition 
of U1snRNP by degradation of SNRNP70*” thus appears to enhance 
polyadenylation, stability and nuclear export of Tsix, leading to the observed 
increases of polyA RNA in both whole-cell and chromatin fractions. 
Nevertheless, the chromatin/whole-cell ratio of polyA Tsix still decreases after 
SNRNP70*" degradation. Two well spliced IncRNAs Meg3 (e) and Rian (f) show 
few intronic signals in both total and polyA RNA-seq. Ind-g, U1 RAP-RNA-seq 
(FA), total RNA-seq after AMO treatments, and total and polyA RNA-seq and TT- 
seq tracks of SNRNP70*" mESCs at 0 hor 4 hauxin treatment are shown. Some 
very large IncRNAs—suchas Kcnqlotl (b, 83.4 kb in the genome sequence), Tsix 
(c, 53.4 kb) and Puti (d, 213 kb)—show decreased RNA signals inthe downstream 
gene body upon U1 AMO treatment; however, this effect was less obviousin 
SNRNP704" mESCs. 


Article 


a b c d 
Samples Sequencings & RT-qPCRs polyA-seq _ polyA vs. total _—e —— 
8.0 8 4 . 
U1 AMO __ otal RNA-seq (nascent & polyA RNA) + Downreg (295) & -auxin,2h + auxin, 2h 
subcellular fractionation B60 * AU 4 &s 4 | 
a (polyA RNA) 26: bree (4) sein 2 - auxin, + auxin, 
3 . > ¢ .Unchange =? 
subcellular fractionation ———-+ S_AM-seq (new polyA RNA) 240 . yi ae Ss $ ae " = _ 
fo} ow , e 
= subcellular fractionation 
SNRNP70*° ——— Pol II ChIP-seq Poo os i 
— faipeed } (transcription rate) + 58 SLAM-seq 
subcellular fractionation : 0.0 oT " 
(EXOSC3-depleted background) RT-qPCR — (RNA degradation) 2 0 2 4 S “™ t ‘ot 
log2 change of AID/CT S — Log2 change of chr/non-cAr alll transcripts (T.to.C converted) 
é f (chr/cell ratio) (total RNA-seq) 
= Conversion rate of SLAM-seq SLAM- g ‘ AID ‘ 
® SNRNP704°, + auxin, chromatin fraction ge sail 0 et AMe (w) sialib seit Ty 
o £4 (new vs. all) - ; 
& 2 Expression Down- Up- Unchanged Expression Down- Up- Unchanged 
a Oh 4h 65 
c O82 337 chr. downreg * 84 © 49 © 204 346 chr. downreg #76 © 49 © 221 
x g & 6,945 non-downreg 0153 © 77 © 715 6 7936 non-downreg ° 104 «85  e 747 
P 9- 5 oj} ~ 
4 9 s3° . 
& 02 
ra ee .2 o 
2 6- os R=0.81 2 
2 s 2 -4 ve . se : gS 
8 48 |” P=4te-117 9 
i+ - = -l 
3° SEve 4 2 06 2 4 = 
g =< log2 change of AID/CT 2 
= 0-— —_ — (chr/non-chr, all transcripts) ' 
° T>A T>C T>G T>A T>C T>G 
h i 
(i), amo: (i) TT-seq log2 change of U1/CT log2 change of 4h/0h 
—cT —_ut SNRNP70“°, + auxin (whole cell expression: FPKM) (whole cell expression: FPKM) 
— U2 — U1/2 8 . 
x ° e J kK SNRNP70*°, + auxin 
5 x 3 — SNRNP70*°. + auxin Pol Il SSP ChIP-seq 
2 2 2 (i) y (i) == chr. downreg (155) (solid line: 0 h) 
so = = 16 Pol Il SSP ChiP-seq === non-downreg (66) (dashed line: 4 h) 
a a 
iv 
+ 8 3 = 45 — Ohexpressed 
° oS g 3 : — 4hexpressed 
J 
PS x 2 Nog -- Oh unexpressed = i 
So = £ -- 4h unexpressed a 
-2kb TSS TES 2kb -5 kb TSS 5 0.4 ey 
So. 
(ii) SNRNP70*°, + auxin (i) SNRNP70*°, + auxin 0 
— Oh(#1) — Oh (#2) === chr. downreg (239) (solid line: 0 h) -5kb TSS TES 5kb “ 7 
— 4h) —4h@2 pean fie _ -5 kb TSS TES 5kb 
(#1) (#2) == non-downreg (151) (dashed line: 4 h) (i) Pol Il S2P ChIP-seq (iS Pol Il S2P ChiP-seq 
= ; z 
eo 
s 3 : So a o 
a = B = 
et a iN) ae 
So ee 4 “Oo 
£ 
N © S + 
fo} om So 
id a = 
-2kb TSS TES 2kb 2 -Skb TSS TES 5Skb -5 kb TSS TES 5kb 
Whole cell (total RNA-seq) Skb TSS TES Skb 
Ey ” 7 
(i) 3 (ii) EXOSC3-depleted, (iii) 
_ P=0.008 mm Scr 3 SNRNP704°, + auxin 
58 pzoo [; M§EXOSC3KD $215 m 4h p00 
BO » |P= 0.005 P=0.004 eg P2005 p= .001 85 
oS 2 P=0.01 Sa ~F gO 
33 [j oe B= 0.005 59 '° 1 89 
Ca) = = 01 2 of 
® g ; P ag | rai = $e 
BE oe 05 , ; 2s 
gs Ze BE 
& BS 26 
oe re 
° a 00 A tN = 
FS PS of of IL ge ot 8 
ge Ne Ww wv 40 Ss 
¥ ¢ x 


Extended Data Fig. 7 | See next page for caption. 


Extended Data Fig. 7 | Analysis of the direct causality of UlsnRNPin 
regulating IncRNA-chromatin retention. a, Summary of experiments that we 
carried out to systematically investigate the effects of acute inhibition of U1 
snRNP onIncRNA-chromatin associations, transcription dynamics, and RNA 
processing and decay. b, Volcano plot of polyA RNA-seq showing the fold 
change (log,) inthe chromatin/whole-cell ratio of ncRNAs upon SNRNP704 
degradation. Red dots and deep grey dots indicate IncRNAs that showa 
significant decrease or increase, respectively, in their chromatin/cell ratio by 
comparing SNRNP70*” (4-h auxin treatment) versus control (0 h) (P< 0.05; 
two-sided ¢-test with three biological replicates). c, Correlation plot of polyA 
and total RNA-seq analysis of SNRNP70*” mESCs. The set of chromatin- 
downregulated IncRNAs shows significantly correlated changes in chromatin 
localization upon degradation of SNRNP70*” (n=346). d, SLAM-seq analysis of 
chromatin and non-chromatin (cytoplasm and nucleoplasm) fractions in 
SNRNP70*"° mESCs. SNRNP704" mESCs were treated with or without auxin for 
2hand then labelled with 4sU for 3 h. After chemical conversion of the 
incorporated 4sU nucleotides to cytidine, RNA from various subcellular 
fractions was isolated for 3’-end polyA-seq library construction. e, Box plots 
showing the conversion rate detected by SLAM-seq of chromatin fractions in 
the last exon of genes with detectable newtranscript (n = 24,097) before (Oh) 
and after (4h) SNRNP70“"" degradation. Box plots show Sth, 25th, 50th, 75th 
and 95th percentiles. f, Pearson correlation analysis of the change in 
chromatin/non-chromatin ratio for new versus all transcripts of IncRNAs with 
detectable newtranscripts (n= 492) identified by SLAM-seq. g, Volcano plots 
showing expression changes of mESC-expressed IncRNAs (n=1,282) after 
treatment with U1 AMO (i) or degradation of SNRNP704" (ii). Chromatin- 
downregulated or non-downregulated IncRNAs were further classified into 
‘downregulated (down-)’, ‘upregulated (up-)’ or ‘unchanged’ according to their 
expression changes in whole-cell samples. LnacRNAs with reduced chromatin 
association upon inhibition of U1 snRNA or SNRNP70*” do not show greater 
downregulated expression by comparison with all IncRNAs. Onlya small 
proportion of them (84 of 337 U1-regulated and 76 of 346 SNRNP70-regulated) 
show decreased transcript levels. P-values obtained by two-sided ¢-test;n=3 
biological replicates. h, Metagene analysis of whole-cell RNA-seq reads for the 


set of Ul-snRNP-regulated, chromatin-downregulated IncRNAs in mESCs. Only 
IncRNAs that do not overlap with any protein-coding gene onthe same strand 
were analysed (n= 239). Similar read-distribution patterns were observed in 
control cells and in cells treated with U1, U2 or U1/U2 AMOs (i) or in cells 
subjected to auxin-induced degradation of SNRNP70*” for Ohor 4h (ii). Thus, 
rapid inhibition of UlsnRNP did not cause global transcription termination, 
although we did observe decreased downstream RNA signals ina few very long 
IncRNAs, suchas KcnqJotl (83 kb), inagreement with the proposed role of U1 
telescripting in protecting the transcription integrity of very large 
transcripts”>. In addition, we conjecture that the slight decreases in total 
transcript levels are likely to be post-transcriptionally mediated by RNA 
degradation instead of an effect of Ulinhibition on nascent transcription (see 
panels i-k below). i, Metagene analysis of TT-seq signals in all mESC-expressed 
genes (n=10,675) (i) and chromatin-downregulated (n= 239) and non- 
downregulated IncRNAs (n= 151) (ii) upon SNRNP704" degradation. Only 
IncRNAs that do not overlap with any protein-coding gene onthe same strand 
were analysed.j, k, Metagene analysis of ChIP-seq signals of Pol II SSP (‘paused’ 
Pol II; i) and S2P (‘elongating’ Pol II; ii) across the gene body and upstream/ 
downstream 5-kb region of all mESC-expressed genes (n= 10,675) and 
unexpressed genes (n =7,933) (j) or chromatin-downregulated (n=155) and 
non-downregulated (n = 66) IncRNAs upon SNRNP70*" degradation (k). Only 
IncRNAs that do not overlap with any protein-coding gene on either strand 
were analysed. For h-k, shadings represent 95% confidence intervals for the 
average enrichment. 1, RT-qPCR analysis of the knockdown efficiency and 
IncRNA expression level change (i), chromatin/non-chromatin ratio (ii) and 
relative expression (iii) in SNRNP70*" mESCs depleted of EXOSC3 by RNA 
inhibition. Knockdown was analysed at 72 hafter shRNA viral infection. The 
observation of increased expression for most IncRNAs analysed is consistent 
witha role of EXOSC3 in mediating RNA degradation (i). Knockdown of EXOSC3 
blocked RNA degradation for most IncRNAs analysed (iii), but failed to rescue 
their decreased chromatin associations induced by auxin (ii). Thus, the effect 
of U1lsnRNP in promoting IncRNA-chromatin binding is not caused by 
increased RNA degradation. Means +s.e.m. are shown; P-values obtained by 
two-sided t-test for three biological replicates. 
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Extended Data Fig. 8| See next page for caption. 


Extended Data Fig. 8 | U1 snRNP regulates IncRNA-chromatin association 
through its interaction with transcriptionally engaged PollII.a, Western 
blots of SNRNP7O, SNRPC, Pol Il and marker proteins with no treatment (i), or 
treated with DMSO, triptolide (TPL), or flavopiridol (Flav) for 1h (ii), orin 
nuclear fractions sequentially extracted with increasing concentrations of salt 
(NaCl) (iii). For i, iii, n=3 independent experiments, for ii, n=2independent 
experiments. b, Scheme for co-immunoprecipitation of the chromatin 
fraction. Benzonase was used to digest all DNA and RNA and to elute proteins 
from chromatin. c, Proteins captured by SNRNP70 coIP-SILAC. Native 
chromatin extracts that were released by benzonase were subjected to anti- 
FLAG co-immunoprecipitation of SNRNP70*” protein (FLAG-tagged) coupled 
with stable-isotope labelling by amino acids in cell culture, followed by mass 
spectrometry. SNRNP70O purification captured the Uland U2snRNPs, as well as 
several components of the U4/6-U5 snRNPs and other splicing factors. Notably, 
the SNRNP70 interactome also identified proteins involved in transcription 
regulation, suchas POLR2A, SPT5 and SPT6. d, Chromatin-based co- 
immunoprecipitation (in the presence of benzonase) and western blot analysis 
confirming the interactions between U1 snRNP and the proteins identified inc. 
Panelsi, ii show co-immunoprecipitation of FLAG-tagged SNRNP70*” protein. 
SNRNP704" mESCs treated with auxin for 4 hserveas the negative control. 
Panel iii shows that co-immunoprecipitation of endogenous SNRPC captured 


the total Pol II. Panel iv shows that co-immunoprecipitation of Pol IIS2P 
captured SNRNP70 under a physiological salt condition (150 mM) but not in 
high-salt conditions, suggesting dynamic associations between U1 snRNP and 
engaged PollIl.n=3 independent experiments. e, RT-qPCR analysis of the 
expression of representative IncRNAs and U1lsnRNA upon treatment with 
DMSO (control), 1 1M with E7107 (inhibitor of U2 snRNP), 1M of flavopiridol 
(transcription inhibitor) or 11M of triptolide (transcription inhibitor) for 1h, or 
with 100 pM ofisoginkgetin (inhibitor of U4/6-U5 snRNP) for 2h. Means +s.e.m. 
are shown; P-values obtained by two-sided t-test with three biological 
replicates. f, Metagene analysis of U1 RAP-DNA-seq signals for all Ensembl 
genes in mESCs treated with DMSO or with flavopiridol (Flav) for 1h. 
Flavopiridol treatment led to reduced U1 RAP-DNA signals downstream of the 
TSS across the gene body. Red shading represents 95% confidence intervals for 
the average enrichment. g, Genome-browser views of U1 RAP-DNA-seq and Pol 
II ChIP-seq at the Malat1 and Tsix loci. 8WG16, hypophosphorylated Pol II; NTD, 
N-terminal domain of Pol II, representing the total Pol II. h, Relative chromatin/ 
non-chromatin ratio (i), expression-level change of representative IncRNAs (ii) 
and knockdown efficiency of targeted genes (iii) after mESCs were treated with 
PRPF8 or SNRNP200 shRNAs for 72 h. Means + s.e.m. are shown; P-values 
obtained by two-sided t-test with three biological replicates. For gel source 
data, see Supplementary Fig. 1. 


Article 


a b C _ chr6:52,023,101-52,065,140 (sense strand only, mm9) 
— Input — U1 RAP-DNA chr6:122,284,671-122,302,550 (sense strand only, mm9) it 
3 PHC1 PROMPT/eRNA 
—) H3K27ac in ih (0, 27] 
6 DNase | HS heh 7 (0, 220] ee ee (0, 7] 
Q == —z (0, 100 
BS Enhancer super enhancer Haunt enhancer ’ 
2 cT ee ae Pa ee yt Pree eee ae ees = Alias ttn re _ _|F 
= 8 U1 a ae Oe aS ee cas jane _ a pre 
= SSS c U2. ee ee a - ~_ et - - = 
£ U1/2 = a Deh dA Me pee 2 — — = lide oe _ = 
-5kb TSS Sko By OT a aan il al a = oe Me 
‘e| =| U1 a i i een — ae a eres —_ eee ee 
g U2 Pap apne Wer EPP we Ww” Or Caren ee | = = dash, a le 
iS 2) lus peace aidesie pane. tence poe Pili — sale —— eee 
° CT ow Fe wae |) ay eae saaiiaa le a en ipa vs pais am = 
= UN a alll cl athe a a = Ae _ 
an Oo} U2 Ree ae ey es ae -a aa ie 
eS ut lg Ee ae eae : aes a ae eee 
2 g 3 Oh. ae Ses See wre © ee ee oo —. {8 
a: a ae Rs 4h Pan Sn ee ee ee we | -_ as L 
& 3] =| Oh RE EEE eee ere ee Pe re ee ie A ae 
-5kb 5’ Enhancer 3’ 5kb $+], 4h Se en — a - bo 
(Ref. 17) oa e=4| Oh, 4 aides 2 oe dh aa E iw. — FS 
°! 4h rs ry a An So Ps | © — 1 = to 
d — ‘i ) 
; — _ —_ _._ (solid line: sense} 
AMO nucleofection cT = U1—— UZ== U1lle== (dashed line: antisense) 
3 cell © cyto 
a 
$s 
= QD 
& ° ° ° i 
= 
z 
= 8 8 3 
: id 2 S g 
Oo o Oo 
-2 kb TSS 2 kb -2 kb TSS 2 kb -2 kb TSS 2kb 
8 SNRNP704", + auxin uaRNAs 
Oh _ oh — 4h __ 4h __ (solid line: sense) 
(#1)--  (#2)--  (#1)—- (#2) — - (dashed line: antisense) 
cell + << 
2 ° o 
Oo Qn 
@ 
| 
Qo 
S oO 
& Oo oO o gs 
Fi 
© © [-2) 3 
3 8 3 8 
o °o So 
-2 kb TSS 2 kb -2 kb TSS 2kb -2 kb TSS 2kb -2 kb TSS 2 kb 
f eRNAs 
AMO nucleofection cT — U1 — U2— U1/2 — 
“ cell g chr N cyto = nuc 
Ss 2 ° 2° 
So 2 Pia S 3S So 
= 
a 
a 
Tt 
8 2 S 8 
3 ° > Ss 
o 
-2kb 5’ 3’) 2kb -2kb 5 3’) 2kb -2kb 5 3°) 2kb -2kb 5 3’) 2kb 
9 SNRNP70*", + auxin eRNAs 
Oh(#1) — Oh(@#2) — 4h) — 4h(#2) — 
+ cell ° chr 5 cyto " nuc 
= N o So 
bad iS So ° 
° ° Won cP N ° 
KA \\ 
ind 
INS Sone 8 38 
So So 
bs = Ss, s 8 
2 kb 5S 3 2kb Cokb 8 3’ 2kb -2kb 5 3’ 2kb -2kb 5! 3 2kb 


Extended Data Fig. 9 | Inhibition of U1 and U2 snRNPs downregulates the 
chromatin association of uaRNAs and eRNAs. a, Metaplots of U1 RAP-DNA- 
seq, showing enrichment of U1 snRNA inthe chromatin proximity of regulatory 
DNA sequences. The top panel shows a + 5-kb window flanking TSSs of Ensembl 


SNRNP70*”-degraded samples (e) ina +2-kb window flanking the TSSs of 
Ensembl genes that do not overlap with any other gene within 2 kb (n=18,972). 
The uaRNAs show upregulated overall expression and more dramatic increases 
inthe cytoplasmic and nucleoplasmic fractions after U1 or U1/2 inhibition, 


genes that donot overlap with any other gene within 2 kb (n=18,972), and the 
bottom panel shows a+ 5-kb window flanking enhancers that do not overlap 
witha gene within 2 kb (n =3,767). b,c, Sequencing tracks showing chromatin 
and cytoplasmic RNA-seq signals of uaRNAs/eRNAs in the PHC] promoter (b) 
and Haunt enhancer (c).d, e, Metaplots of RNA-seq reads of uaRNAs from 
whole cells and subcellular fractions in AMO-treated samples (d) and 


while there are comparable (U1 AMO) or slightly decreased (U1/2 AMO) uaRNA 
signals in the chromatin fraction at the TSS-to-1-kb upstream region. 

f, g, Metaplots of RNA-seq reads of eRNAs from whole cells and subcellular 
fractions in AMO-treated samples (f) and SNRNP70*"-degraded samples (g) 
in a+2-kb window flanking enhancers that do not overlap with any gene within 
2kb (n=3,767). 
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Extended Data Fig. 10 |U1snRNP tethers and mobilizes IncRNAs to 
chromatin. a, b, Immunofluorescence analysis of SNRNP70O (a) and SC35 (b) in 
auxin-treated SNRNP70*” and wild-type mESCs. SNRNP70*” ESCs marked by 
stably integrated GFP transgenes (GFP", highlighted with dashed white lines) 
and GFP-negative wild-type mESCs (GFP ) were mixed and treated with auxin 
for 4h.n=3 independent experiments. c, Quantification of the numbers of 
Malatl speckles (equivalent diameter greater than 0.5 zm) identified by RNA 
FISH (Fig. 3c). Box plots show Sth, 25th, 50th, 75th and 95th percentiles, with 
median values labelled by the plots and sample sizes (wild type, n=52; 
SNRNP704”, n=88) labelled onthe x axis. P-values obtained by two-sided 
Mann-Whitney test. d, Sequencing tracks of Malat1 ChIRP-DNA-seq and TT-seq 
in Malatl1 (i) and representative loci that are targeted by Malatl (ii).Inboth 
panels, the top set of tracks show Malat1 ChIRP-seq upon SNRNP704 
degradation (at Oh and 4h) or upontreatment with DMSO controlor triptolide 
(TPL). The bottom set of tracks show TT-seq upon SNRNP70*” degradation (at 
Ohand 4h). In panel ii, TT-seq signals on both plus and minus strands are 
shown. We used the mm9 mouse genome assembly. e, qPCR analysis of Malatl 
ChIRP-DNA of SNRNP704" mESCs before (0 h) and after (4 h) treatment with 
auxin. Data are shownas mean +s.e.m., for three biological replicates. P-values 
obtained by two-sided t-test. f, Mechanistic representation of UlsnRNP and its 
interplay with Pol II and PASs in regulating the tethering and mobilization of 
noncoding RNA on chromatin. Notably, IncRNAs, uaRNAs and eRNAs share 
many features, including chromatin association, inefficient or absent splicing 
and polyadenylation, low-level expression and short half-lives”**?3?, LnacRNAs 
in general are enriched with 5’ U1-recognition sites but depleted of 3’ splice 
sites. For uaRNAs and eRNAs, U1 binding on chromatin is enriched at enhancer 
DNA sequences and TSSs (the 5’ end of uaRNAs), even though U1-recognition 
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sites are depleted in uaRNA DNA sequences!”?°***5, U1 snRNP may bind uaRNAs 
and eRNAs through co-transcriptional U1-Pol Il interactions. Splicing releases 
the U1 snRNP from pre-mRNAs'. However, IncRNAs, uaRNA and eRNAs remain 
associated with U1 snRNPs because of inefficient or absent splicing (the lack of 
3’ss could be one reason)**. Through its interaction with transcriptionally 
engaged PollII, UlsnRNP is tethered to chromatin and subsequently retains its 
associated IncRNAs and ncRNAs onchromatin. Meanwhile, the inhibitory 
function of U1snRNP on polyadenylation promotes transcription elongation at 
cryptic PASs and RNA decay at authentic PASs”*”>*°*"°, Rapid RNA turnover 
renders these transcripts less likely to leave the chromatin, contributing in part 
to their enrichment on chromatin and lack of nuclear export. Although the 
properties of chromatin binding and instability appear to be intrinsically 
coupled for IncRNAs and chromatin-bound unstable ncRNAs, UlsnRNP andthe 
RNA-degradation machinery appear to play independent yet synergistic roles 
in facilitating RNA-chromatin association. Coupled chromatin association and 
instability of many IncRNA transcripts may contribute to the observed cis- 
targeting and regulatory functions in their chromatin neighbourhoods. Most 
short-lived IncRNA transcripts spread locally within their neighbourhoods, 
while a few stable and abundant IncRNAs, suchas Malati, exist long enoughto 
be trans-targeted to other genomic sites. For stable IncRNAs, persistent 
binding with U1 snRNP, and perhaps engaged Pol II, may drive IncRNA 
mobilization to distinct nuclear compartments (such as nuclear speckles) or to 
thousands of trans genomic sites (in the case of Malat1). Possibly, these highly 
expressed IncRNAs have developed evolutionarily to take advantage of the U1- 
tethering mechanism to achieve trans functions. In addition to U1 snRNP, U2 
snRNP (but not the splicing reaction), and perhaps other factors, contributes 
to this process. 
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The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
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Data collection Bio-Rad CFX384 Real Time System was used for RT-qPCR data collection. Illumina HiSeq 2500 and HiSeq X TEN were used for sequencing 
data collection. nikon ALRMP microscopy and related softwares were used for collecting and analyzing FISH and IF data. 


Data analysis For RNA-seq, REL-seq, TT-seq, and ChIP-seq data analysis, Bowtie v2.2.3, Bowtie v1.0.0, Tophat v2.0.10, Cufflink v2.1.1, BEDTools v2.17.0, 
macs v1.4.2 and Novoalign v3.02.05 were used. Integrative Genomics Viewer v2.3.97 were used for signal visualization of sequencing 
data. R studio v3.4.3, Excel 2016, and GraphPad Prism v6.0 were used for statistical analysis. FlowJo v7.6.1 was used for FACS analysis. 
Image J v1.8.0 was used for quantification of Western blot. "RNAfold v2.4.13 web server" (http://rna.tbi.univie.ac.at/cgi-bin/ 
RNAWebSuite/RNAfold.cgi) was used for RNA secondary structure predication. 
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- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


The main data supporting the findings of this study are available within the article and Supplementary Information files. Sequencing data have been deposited in the 
GEO database under the accession number GEO: GSE107131 and GSE134287. and will be available after the manuscript is formally accepted. All other data 
supporting the findings of this study are available from the corresponding author upon reasonable request. 
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Sample size No statistical calculations were performed to estimate sample size. For REL-seq analysis, we included eight libraries and two representative 
cell lines (mouse embryonic stem cells and human HEK 293T), which are sufficient for screening technologies. See methods and 
supplementary note 1. For RNA-segs, we included three independent biological replicates, which are sufficient for sequencing analysis. 
Sample size determination in RT-qPCR and other graphs were described as previous published study and experimental knowledge. 

Data exclusions No data were excluded. 

Replication All experiments were repeated (biological repeats) at least twice. All attempts of replications were successful. 


Randomization — There is not any bias when harvesting samples or collecting data. 


Blinding Experiments in this study were not done blinded. The samples harvest process made it impossible to be blinded. Data were collected in 
parallel to minimize bias. 
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Antibodies 


Antibodies used 


Validation 


Eukaryotic cell lines 


Antibodies used for IP and western blot: SNRNP7O (Santa Cruz, sc-390988, E-4, lot: D0915, 1:500); SNRPC (Abcam, ab192028, 
EPR16034, lot: GR180605-1, 1:5000); Histone H3 (Easybio, BE2013-100 lot:80920815, 1:10000); TUBULIN (CWBIO, CWO098M, 
1:2000); ACTB (Abclonal, ACO04, lot: 01270/30337, 1:5000); RNAPII CTD (Abcam, ab52202, 1:2000); Hypophosphorylated RNAPII 
(Covance, MMS-126R, 8WG16, 1:2000 for western blot 1:200 for IP); RNAPII Ser5P (CST, 13523, DONSI, lot: 1, 1:5000 for western 
blot; 1:300 for IP and ChIP); RNAPII Ser2P for IP and western blot (CST, 13499, E1Z3G lot: 1, 1:5000 for western blot; 1:300 for IP 
and ChIP), RNAPII Ser2P for western blot (Covance, MMS-129R, H5, lot: 14943202, 1:1000); p53 (Abclonal, A5761, 1:1000); 
SNRNP200 (Abclonal, A6063, lot: 0102340101, 1:1000); PRPF8 (Sangon, D163565, lot: D163565-0025, 1:1000); p-SF3B1 
(Abclonal, APO844, lot: 2158550301, 1:2000); Spt6 (CST, 156165, D6J9H, lot: 1, 1:2000); SC35 (abcam, ab11826, 1:500 for IF). 


Anti-SNRNP7O (Santa Cruz, sc-390988, E-4, lot: D0915), this antibody has been claimed to react with mouse SNRNP7O by the 
manufacturer. Validation data for the antibodies used can be found as follows: https://www.scbt.com/p/u1-snrnp-70-antibody- 
e-4, 
Anti-SNRPC (Abcam, ab192028, EPR16034), this antibody has been claimed to react with mouse .. by the manufacturer. 
Validation data for the antibodies used can be found as follows: https://www.abcam.com/u1-c-antibody-epr16034- 
ab192028.html. 

Anti-Histone H3 (Easybio, BE3021-100), this antibody has been claimed to react with mouse Histone H3 by the manufacturer 
(http://www.bioeasytech.com/home/product/article/id/11.html ) and also confirmed by our data. 

Anti-TUBULIN (CWBIO, CW0098M), this antibody has been claimed to react with mouse TUBULIN by the manufacturer. 
Validation data for the antibodies used can be found as follows: https://www.cwbiotech.com/uploads/ 
websitepdf/98be23fc-9451-4a27-93f6-02375421a1e5.pdf 


Anti-ACTB (Abclonal, ACO04), this antibody has been claimed to react with mouse ACTB by the manufacturer. Validation data for 
the antibodies used can be found as follows: https://abclonal.com.cn/catalog/ACO04. 

Anti-RNAPII CTD (Abcam, ab52202), this antibody has been claimed to react with the CTD of mouse Polymerase II by the 
manufacturer. Validation data for the antibodies used can be found as follows: https://www.abcam.com/rna-polymerase-ii-ctd- 
repeat-ysptsps-antibody-ab52202.html. This antibody has been used by Li et al, 2015 to detect Pol Il by western blot, and also 
confirmed by our data. 
Anti-Hypophosphorylated RNAPII (Covance, MMS-126R, 8WG16), this antibody has been claimed to react with mouse 
Polymerase || by the manufacturer. This antibody has been used for ChIA-PET by Bertolini et al, 2019, and also confirmed by our 
data. 
Anti-RNAPII SerSP (CST, 13523, DON5I), this antibody has been claimed to react with mouse Ser5 phosphorylated Pol II by the 
manufacturer. Validation data for the antibodies used can be found as follows: https://www.cellsignal.com/products/primary- 
antibodies/phospho-rpb1-ctd-ser5-d9n5i-rabbit-mab/13523. This antibody has been used by Jiang et al, 2018 for ChIP assay, 
and also confirmed by our data. 
Anti-RNAPII Ser2P (CST, 13499, E1Z3G), this antibody has been claimed to react with mouse Ser2 phosphorylated Pol II by the 
manufacturer. Validation data for the antibodies used can be found as follows: https://www.cellsignal.com/products/primary- 
antibodies/phospho-rpb1-ctd-ser2-e1z3g-rabbit-mab/13499. 
Anti-RNAPII Ser2P (Covance, MMS-129R, HS), this antibody has been claimed to react with mouse Ser2 phosphorylated Pol II by 
the manufacturer. This antibody has been used for ChIP by Espinosa et al, 2003, and also confirmed by our data. 

Anti-p53 (Abclonal, A5761), this antibody has been claimed to react with mouse p53 by the manufacturer. Validation data for the 
antibodies used can be found as follows: https://abclonal.com.cn/catalog/A5761. 
Anti-SNRNP200 (Abclonal, A6063), this antibody has been claimed to react with mouse SNRNP200 by the manufacturer. 
Validation data for the antibodies used can be found as follows: https://abclonal.com.cn/catalog/A6063. 

Anti-PRPF8 (Sangon, D163565), this antibody has been claimed to react with mouse PRPF8 by the manufacturer. Validation data 
for the antibodies used can be found as follows: https://www.sangon.com/productDetail?productInfo.code=D163565. 
Anti-p-SF3B1 (Abclonal, APO844), this antibody has been claimed to react with mouse phosphorylated SF3B1 by the 
manufacturer. Validation data for the antibodies used can be found as follows: https://abclonal.com.cn/catalog/AP0844. 
Anti-SPT6 (CST, 15616S, D6J9H), this antibody has been claimed to react with mouse SPT6 by the manufacturer. Validation data 
for the antibodies used can be found as follows: https://www.cellsignal.com/products/primary-antibodies/spt6-d6j9h-rabbit- 
mab/15616. 

Anti-SC35 (abcam, ab11826), this antibody has been claimed to react with mouse SC-35 by the manufacturer. Validation data for 
the antibodies used can be found as follows: https://www.abcam.com/sc35-antibody-sc-35-nuclear-speckle-marker- 
ab11826.html. This antibody has been used for IF by Gavrilov et al, 2013, and also confirmed by our data. 


Policy information about cell lines 


Cell line source(s) 


Authentication 


Mycoplasma contamination 


Commonly misidentified lines 
(See ICLAC register) 


HEK293T cells were obtained from ATCC (CRL-3216), Hela cells were obtained from ATCC (CCL-2), and mouse ESCs (E14T, 
46C) was a gift from Austin Smith's lab. 


The cell lines have been used in the lab for over 3 years, so authentications were not performed. The SNRNP7O-AID cell line 
constructed in this study were confirmed by PCR and Western blot. 


All cell lines have been tested for mycoplasma contamination free by PCR. 


No cell lines used in this study were found in the database of commonly misidentified cell lines that is maintained by ICLAC 
and NCB BioSample. 


= 
ced) 
cP 
[S 
= 
O 
= 
O 
Wn 
© 
red) 
a 
a 
=r 
= 
O 
Io. 
fo) 
= 
S 
= 
a 
n 
ie 
=. 
=: 
red) 
5 
< 


ChIP-seq 


Data deposition 


Confirm that both raw and 


Data access links 
May remain private before publication. 


Files in database submission 


Genome browser session 
(e.g. UCSC) 


Methodology 


Replicates 


Sequencing depth 


Antibodies 


Peak calling parameters 


Data quality 


Software 


Flow Cytometry 


final processed data have been deposited in a public database such as GEO. 


Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks. 


https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE134287 (token: qvmvsqgoxjglxwj) 


7OAID-Oh_Pol2S2P.bed; 7OAID-4h_Pol2S2P.bed; 7OAID-Oh_Pol2S5P.bed; 7OAID-4h_Pol2S5P.bed 


Provide a link to an anonymized genome browser session for "Initial submission" and "Revised version" documents only, to 
enable peer review. Write "no longer applicable" for "Final submission" documents. 


ChIP-seq for each condition replicated once. 


7OAID-Oh_Pol2S2P: total reads--8762112, uniq mapping ratio--75.41%, reads lenght--143nt, paired-end; 
7OAID-4h_Pol2S2P: total reads--8373124, uniq mapping ratio--73.72%, reads lenght--143nt, paired-end; 
7OAID-Oh_Pol2S5P: total reads--7259811, uniq mapping ratio--90.16%, reads lenght--15Ont, paired-end; 
7OAID-4h_Pol2S5P: total reads--7373872, unig mapping ratio--90.49%, reads lenght--15Ont, paired-end. 


Ser2P Pol Il: Cell Signaling Technology, 13499; SerSP Pol Il: Cell Signaling Technology, 13523 
bowtie2 -x ./mm10 -1 sample_R1.fastq -2 sample_R2_.fastq -k 1 -S sample.sam 


macs14 -t sample.sam -c input.sam -g mm -n sample_input -p 1e-5 -B -S 


7OAID-Oh_Pol2S2P: FC > 5, FDR < 5%, 5890 peaks; 
7OAID-4h_Pol2S2P: FC > 5, FDR < 5%, 7055 peaks; 
7OAID-Oh_Pol2S5P: FC > 5, FDR < 5%, 14780 peaks; 
7OAID-4h_Pol2S5P: FC > 5, FDR < 5%, 16170 peaks. 


Bowtie2, Bedtools, macs, ngs.plot 
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Gating strategy 


The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 
|_| The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers). 
All plots are contour plots with outliers or pseudocolor plots. 


| |A numerical value for number of cells or percentage (with statistics) is provided. 


Cells expressing a GFP reporter fused with different insert were harvested, and perform FACS analysis directly to measure the 
GFP intensity change. The flow cytometry was used for a mini-screen, no plot shown. The summary of GFP intensity changes was 
shown in Extended Data Figure 3). 

BDCalibur 

Data collection: BD FACSDiva 8.0; Data analysis: FlowJo 7.6.1 


No sorting was conducted. 


FSC/SSC were used to discern single cells from doublets/multiple cells. Samples without GFP reporter transfection were used to 
establish boundaries between negative and positive cells. 


| Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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HACKS TO HELP RESEARCHERS 
PUT WORDS ON THE PAGE 


Productivity coaches, boot camps and online meet-ups can assist 
scientists with getting their writing done. By Roberta Kwok 


annah James started to fret about 

her unfinished thesis in March 2017. 

The then-fourth-year PhD student 

in archaeological geochemistry had 

to write about 60,000 words on her 

analyses of human teeth. Every couple of 

weeks since 2015, she had attended sessions 

organized by her university’s research skills 

and training group that helped students to 

focus on writing while holed up ina room on 
campus for several hours or an entire day. 

ButJames was having trouble clarifying the 


main points of her thesis, and needed alonger 
block of time to concentrate and put her ideas 
on paper. And it was hard for her to work out- 
side the group sessions — she found herself 


“We find ourselves running 
around and being busy, 
getting alot done, but the 
paper is not getting written.” 


© 2020 Springer Nature Limited. All rights reserved. 


distracted by e-mail or minor details in her 
graphs. James was hoping to submit her thesis 
tothe Australian National University (ANU) in 
Canberra ina year’s time, but had produced 
only 10,000 words that seemed usable. “I just 
hit a point of panic,” she says. 

Soshe signed up for athree-day programme 
that October called Thesis Boot Camp. For 
many ANU students whoare nearing deadlines, 
joining the programme signals desperation, 
says Inger Mewburn, the university’s director 
of researcher development and founder of The 
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Thesis Whisperer, a blog about finishing a PhD. 
“There’s a lot of crying.” 

With 27 other participants, James attended 
classroom sessions that lasted from Friday 
afternoon to Sunday evening. Mewburn 
gave the students exercises, such as writing 
without using the delete key, to discourage 
perfectionism. A psychologist offered one- 
on-one consultations and tips for dealing with 
challenges such as negative thoughts. And 
students received a large Lego-like block for 
every 5,000 words that they produced. James, 
inspired to dump her thoughts out of her head, 
wrote more than 20,000. 

Having achieved that, she continued to 
attend writing meet-ups. She still had to put 
back her thesis deadline, partly because she 
switched to a part-time PhD programme 
in December 2017, but she submitted her 
78,000-word dissertation in February this 
year. “I found motivation again,” she says. 

James’s story will sound familiar to any 
researcher who has struggled to complete a 
paper, dissertation, grant proposal or book 
chapter. When schedules are crammed with 
laboratory work, teaching or administration, 
scientists often delay writing. “We basically 
find ourselves running around and being busy, 
getting alot done, but the paper is not get- 
ting written,” says Olga Degtyareva, founder 
of Productivity for Scientists, a company in 
Dunfermline, UK, that helps researchers to 
overcome procrastination and be more pro- 
ductive. Even when scientists do have time, 
they might endlessly delete and revise, let 
their attention wander, or be so sensitive to 
potential criticism of their ideas that they are 
unable to string sentences together on paper. 

And yet writing is crucial to propelling 
careers. PhD students often need papers on 
their CVtoland postdoctoral posts, and publi- 
cationrecords and grant funding can tilt tenure 
decisions. “You need to be able to show you've 
been productive,’ says Anna Clemens, a Prague- 
based editor and writing coach for scientists. 

Some researchers, like James, rely on writ- 
ing at meet-ups. Others use professional 
services suchas classes or coaching. Produc- 
tive scientists often make an effort toimprove 
their writing process, whether by scheduling 
weekly times or using mental hacks to focus. 
Disentangling a paper’s core ideas, breaking a 
project down into bite-sized tasks and finding 
the right software (see ‘Kick-start writing’) can 
ease the pain. 

But the first step is to prioritize writing. “It’s 
very easy to put it last onthe list,” Mewburnsays. 


A personal system 

Early-career researchers often struggle with 
barriers to their writing, according to a2019 
study co-designed by Prolifiko, acompany in 
Leeds, UK, that offers coaching and a digital 
platform to improve writers’ productivity. 
In 2018, the firm surveyed 593 academics 
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KICK-START WRITING 


These software tools and resources might 
boost productivity. 


- Academic Phrasebank. A website 
maintained by the University of 
Manchester, UK, that offers common 
phrases for academic writing 
(www.phrasebank.manchester.ac.uk). 

+ Scapple. Software that helps writers to 
chart connections between ideas. 

* Scrivener. Writing software that includes 
features for outlining, dividing a project 
into sections and tracking word-count 
targets. 

+ Shut Up & Write! A community of writers 
operated by the non-profit organization 
Writing Partners in San Francisco, 
California, with writing meet-ups in 

47 countries. 

+ Freedom and LeechBlock. Productivity 
apps that block distracting websites. 

+ TextExpander and Typelt4Me. Software 
that expands user-specified keywords into 
longer phrases. 


from various disciplines around the world 
and categorized participants by career 
stage: early (up to 5 years’ experience), mid 
(6-15 years), and late (at least 16 years’ expe- 
rience). When asked to choose from a list of 
factors that hindered writing and publishing, 
46% of early-career researchers picked “feel- 
ing overwhelmed witha lack of control”, com- 
pared with 33% and 19% of mid- and late-career 
participants, respectively. Procrastination 


was common; one early-career participant 
responded, “I play chicken with deadlines,” 
and another reported, “I get sucked into Face- 
book ... Hours go by and [I’ve] done nothing.” 
Others struggled with negative thoughts. 
“Some days | feel physically sick at writing or 
reading anything that has to do with my PhD,” 
one researcher said. Meanwhile, mid-career 
participants were more likely to cite heavy 
workloads, everyday interruptions and family 
commitments as barriers. 

But career stage wasn't rigidly linked to 
writing success. The team found examples 
of experienced scholars who were ‘miserable 
and blocked’ and younger researchers who 
were ‘super productive’, says Chris Smith, 
co-founder of Prolifiko. What mattered was 
having a system, such as setting a writing 
schedule or asking a co-author to hold the 
researcher to deadlines. People who con- 
sistently used certain tactics to push writing 
forward tended to experience fewer blocks. 
Sixty-one per cent of them reported feeling 
very satisfied with their writing, compared 
with 20% of those who had never thought 
about a system, Smith says. 

Writing systems span many approaches, 
Smith notes. Although the conventional 
advice is to write every day, “it’s not the only 
way”, he says. What’s important is “having a 
personal system that suits you”. Mid-career 
researchers were more likely to set aside 
weekly or monthly slots (a method called 
time-blocking), perhaps because they were 
too busy to write each day. Those who wrote 
daily reported higher levels of satisfaction, 
but time-blocking writers tended to pub- 
lish more, Smith says. People who wrote 


a 


Bec Evans (far right), co-founder of the writing-productivity company Prolifiko, gives a 


presentation at Google in London. 
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during holidays or sabbaticals were the least 
satisfied. Smith speculates that they might 
have felt stressed about completing other 
work first, or have unrealistic expectations 
about the amount they could accomplish 
during their time off. 

Daniel Vreeman decided in 2010 that 
he needed a better system. Vreeman, a 
biomedical-informatics researcher then at 
Indiana University School of Medicine in 
Indianapolis, had been writing during gapsin 
his packed schedule. But squeezing in half an 
hour or an hour between meetings was ineffi- 
cient, he says. So, on his calendar he blocked 
off half a day per week for writing; if people 
asked about that slot, he or his assistant told 
them that he had another commitment. Later, 
Vreeman increased the block to a full day on 
Fridays. Although he makes exceptions for 
travel or unavoidable administrative work, 
meeting his goal more frequently “is better 
than not meeting it at all”, says Vreeman, 
who nowworksata satellite office in Fishers, 
Indiana, for the non-profit research institute 
RTI International in Research Triangle Park, 
North Carolina. Each year, he writes on aver- 
age 4 papers, 6 grant proposals, 8 conference 
abstracts, 10 blog posts, 10 technical docu- 
ments, 20 grant progress reports and a book 
chapter. 

Researchers with more harried schedules 
might find Vreeman’s method unfeasible, 
but they can still be productive. In early 
2014, structural engineer Eva Lantsoght was 
teaching three new classes at the University 
of San Francisco de Quito in Ecuador. Sched- 
uling more than a 2-hour block for writing 
papers was often “impossible”, she says. So 
Lantsoght broke each paper into small tasks 
and tackled them during 1- or 2-hour slots. For 
motivation, she sometimes used the ‘pomo- 
doro’ time-management technique, which 
involves doing 25 minutes of focused work 
at atime. In this way, Lantsoght published 
eight papers based on her dissertation over 
the following two years. 

After learning to deal with overwhelming 
demands during her physics research career, 
Degtyareva set up Productivity for Scientists 
in 2011. Inher online courses, she provides her 
students with productivity strategies, such as 
telling them to choose a target journal and 
download the guidelines and manuscript 
template. “You can literally start filling in the 
blanks,” she says. Students must complete 
one task per day, and other participants hold 
them accountable. 

In 2013, Marina Cortés, then a postdoc in 
cosmology at the University of Edinburgh, 
UK, was feeling uninspired. She would make 
herself try to write a paper by “brute force” 
even when she was tired. After seeing a pres- 
entation by Degtyareva, Cortés signed up for 
the writing class, which, she says, helped her 
to prioritize rest and well-being. She started 


sleeping more and working with greater 
focus for shorter periods. Cortés, now a 
cosmologist at the Perimeter Institute for 
Theoretical Physics in Waterloo, Canada, 
wrote three papers over the course of those 
classes. One was highlighted in a viewpoint 
article that year in the online magazine 
Physics, and another won the 2014 Buchalter 
Cosmology Prize. 

Some researchers feel motivated to 
write by participating in meet-ups. Shut Up 
& Write!, acommunity operated by the non- 
profit organization Writing Partners in San 
Francisco, California, runs free writing meet- 
ups year-round in 47 countries. And academic 
researchers can join an online event called 
AcWriMo every November to set themselves 
ambitious writing goals and tweet about their 


“I don’tthink the 
best strategy isto 
just sit down and 
start writing.’ 


progress. PhD students can look for thesis 
boot camps offered by their institution or 
local facilitators; for example, freelance 
writer and facilitator Peta Freestone, based 
in Edinburgh, designed an early version of the 
ANU boot camp and has since run many such 
programmes in Europe and Asia. 


Find the story 


Sometimes, the problem is not time or 
motivation, but a lack of focus. Degtyareva 
advises clients to choose one paper — say, 
the easiest of the ones they wish to write. And 
Clemens says that scientists should work out 
the problem that the paper addresses and the 
key message before tackling a draft. “I don’t 
think the best strategy is to just sit down and 
start writing,” she says. When Clemens edits 
papers by researchers who haven't done this 
preparation, she sometimes deletes entire 
paragraphs that aren’t relevant. 

Diagramming ideas can help, Mewburn 
says. She recommends using software called 
Scapple to create ‘mind maps’ — concepts 
connected by lines. Mewburn also suggests 
constructing aliterature-review matrix, atable 
in which each column is a relevant paper and 
each row a theme; scientists should fill each 
cell with what that paper says about that 
theme. Seeing whether each study leaves one 
or more themes unaddressed helps research- 
ers toidentify gaps on which their study could 
shed light, Mewburn says. 

Communicating research is fundamental 
to scientists’ jobs, notes Clemens. “If you're a 
scientist, you're a writer,’ she says. 


Roberta Kwok is a freelance writer in Kirkland, 
Washington. 
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COLLABORATIVE WRITING: 
BEYOND GOOGLE DOCS 


Asmall but growing suite of tools allows researchers to author and edit 
scientific documents as a team, no e-mail required. By Jeffrey M. Perkel 


raft scientific manuscripts are 

typically confidential. So, when Elana 

Fertig was asked to take a look at an 

in-development paper ona functional 

gene-annotation strategy, she 
expected to receive the file in a private e-mail. 
What she got was a public announcement, 
shared on Twitter. 

The paper had been written by Olga 
Botvinnik, a computational biologist at the 
Chan Zuckerberg Biohub in San Francisco, 
California, who is an advocate of the global 
movementto make research more accessible. In 
November 2019, as Botvinnik started preparing 
her paper, she decided totry this open-science 
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ethos out for herself. “I wanted to walk the walk 
of open science,” Botvinnik says. 

Botvinnik managed her paper as if it were 
open-source software. She wrote it in a plain- 
text editor and placed text files alongside 
data sets and code for generating figures on 
the code-sharing site GitHub. She invited her 
four co-authors to submit edits using Git, soft- 
ware that tracks precisely how and whenafile 
has been changed. And she used a dedicated 
tool called Manubot to render the document 
as auser-friendly manuscript, which she then 
published online and tweeted to the world. 

Fertig, a computational biologist at 
the Johns Hopkins School of Medicine in 
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Baltimore, Maryland, says it was a “funny 
experience” to be tweeted an unpublished 
paper. “It’s a very different way of writing than 
the traditional academic science of not putting 
it out before it’s a finished product.” 

Botvinnik’s manuscript was just a shell at this 
point: two of the figures were placeholders, 
and the methods section read, “We did things.” 
But, she says, the fact that the draft was pub- 
licly accessible made it easy to solicit feedback 
from co-authors and the broader community. 
“It’s definitely been very, very helpful to be able 
to showsomeone, ‘Here’s what I’m thinking so 
far. Here are some figures; here’s some text. 
What do you think?” 


ILLUSTRATION BY THE PROJECT TWINS. 


Say ‘collaborative writing’ and most 
researchers probably think of Google Docs, 
the ubiquitous word processor that allows 
multiple authors to co-edit adocument online 
in real time. But Google Docs lacks features 
that some scientists require, such as refer- 
ence management, support for code and data 
and the ability to directly submit articles to 
journals and preprint servers. 

Manubot is one of a small but growing 
number of tools specifically designed for col- 
laborative writing; others include Overleaf, 
Authorea, Fidus Writer and Manuscripts.io. 
These tools not only close some of the key fea- 
ture gaps, but also provide a glimpse of where 
scientific communication might move next. 


Partners in editing 


Most collaborative writing tools offer 
researchers a range of useful functions. Team 
members can keep documents private or share 
them with select collaborators; track changes 
and comment onthe text; and edit documents 
simultaneously or asynchronously with their 
collaborators. 

Science-focused programs supplement those 
with features aimed at the research community, 
such as built-in citation management. (Some 
citation managers can integrate with Google 
Docs using plug-ins, such as Zotero and Paper- 
pile.) Users can generally import libraries from 
reference managers suchas Zotero or Mendeley, 
or query external databases directly. The ‘cite’ 
buttonin Authorea, for example, allows users to 
search PubMed or CrossRef, or pullin articles by 
DOI or URL. In Fidus Writer, references can be 
added from Zotero witha simple drag-and-drop. 

Manubot features cite-by-identifier, which 
builds bibliographies using a DOI, aPubMed or 
arXividentifier or a URL, without the need for 
areference manager. Inserting “@doi:10.1371/ 
journal.pcbi.1007128” into a Manubotarticle, 
for instance, instructs the tool to find and 
insert a reference to the paper itself. 

Botvinnik calls this approach “pretty 
magical”, because it circumvents the problem 
of researchers using (and trying to synchronize) 
different reference managers and libraries. “I 
like that | canjust use the DOI and it works, and 
everyone else knows that thereis one source of 
truth for the citation: the DOI,” she says. 

Authorea and Overleaf support LaTex, the 
typesetting language preferred by physicists, 
mathematicians and computer scientists. In 
2017, CERN, Europe’s particle-physics labo- 
ratory near Geneva, Switzerland, adopted 
Overleaf as its preferred collaborative author- 
ing platform; some 4,800 users have signed 
up, says CERN computing engineer Nikos 
Kasioumis. LaTeX is quite an advanced system, 
however, so Authorea and Manubot might be 
better options ifa simpler file format is needed. 
Both use the plain-text language Markdown. 

Using Authorea and Manuscripts.io, authors 
canembed and execute software code in their 


articles, and bundle figures together with 
the data used to create them — such features 
support computational reproducibility. “The 
intention is that you can create dynamic 
representations of your work, which include 
code, data and figures, and the narrative, 
all versioned together,” says Matias Piipari, 
founder of Manuscripts.io, which (like Autho- 
rea) isnow owned by the publisher Wiley. 


“As such tools gain traction, 
scientific articles become 
ever more dynamic.” 


For those who prefer Google Docs, New 
Zealand-based Stencilais developing a plug-in 
that allows authors to enhance documents with 
executable code blocks, data tables and equa- 
tions. Based on steganography, acryptographic 
trick in which data are encoded in images, 
Stencila’s plug-in was written to “bridge that 
gap between the coders and the clickers”, says 
founder Nokome Bentley. “It’s taking the code 
tothe environment that clickers are used to.” 


Coder workflows 


Manubot, by contrast, tends to appeal to 
coders. Developed in the laboratory of bio- 
informatician Casey Greene at the University 
of Pennsylvania in Philadelphia, the tool was 
designed to manage the writing of a review 
article on deep-learning — and coordinate its 
three dozen authors. The challenge: keeping 
track of which collaborators contributed 
which bit of text, line by line. “We expected 
to have a large number of contributors and 
we wanted to be able to look at the ‘atomic’ 
changes of one person and one group of 
changes,” Greene says. That is, instead of 
navigating a tangled mess of tracked changes, 
Greene wanted to be able to review each 
change individually, and to keep the online 
draft automatically up to date. 

Manubot solves those problems by cobbling 
together various open-source tools, says Daniel 
Himmelstein, a Greene lab postdoc who helped 
to lead Manubot’s development. These include 
Pandoc, which provides file-conversion func- 
tionality, and GitHub Actions, which automates 
functions such as document creation. To set 
up a Manubot project, users clone a dedicated 
GitHub repository to their computer and mod- 
ify it using a standard programming text edi- 
tor, such as Emacs or SublimeText. Changes are 
then pushed back to GitHub, which logs them 
and rebuilds the document in HTML, Word 
or PDF format. Collaborators can modify the 
manuscript by submitting changesinthe form 
of a GitHub ‘pull request’ (explore our example 
Manubot project at go.nature.com/39eqosg). 
The result is elegant, but complex. 

Andall of this extra functionality can require 
advanced programming skills. Fertig has 
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written grant applications using Manubot, and 
is comfortable with GitHub. But she won't be 
using Manubotto write collaborative papers, 
because the level of programming it involves 
tends to be beyond the reach of her clinician 
co-authors. “There’s no way they have the 
bandwidth to pick up Manubot,” she says. 


Easing submission 


Increasingly, developers are fitting these tools 
with features to better encapsulate the scien- 
tific process. Some, for instance, support JATS 
XML, afile format commonly used in scientific 
publishing. 

JATS XMLisastructured, semantic file format 
that provides a rich set of metadata tags for 
article elements such as author names, arti- 
cle sections, funding sources and database 
accession numbers. Giuliano Maciocci, head of 
product and user experience at the open-access 
journal eLife, explains that the format “decou- 
ples the structure of the article from its pres- 
entation’, which makes the data easier to 
search, access and manipulate. 

Editors typically build documents by 
converting author-submitted files into a for- 
mat they can publish in, Maciocci says — a 
labour-intensive, error-prone process. To help 
automate this process, eLifeis developing atool 
called Libero Editor, which it hopes to release 
this year. Based on the Texture editor, the tool 
will allow eLife staff and authors to create and 
work withJATS XML documents from beginning 
to end. Manuscripts.io can already import 
JATS-formatted content, Piipari says, and it, 
together with Fidus Writer and the Stencila 
plug-in can export to that format as well. 

Authorea allows authors to directly submit 
articles to around 41 journals and preprint 
archives, according to founder Alberto 
Pepe — and to embed interactive figures, 
executable code and data. Roberto Peverati,a 
computational chemist at the Florida Institute 
of Technology in Melbourne, was asked 
to contribute to one such journal, Wiley’s 
InternationalJournal of Quantum Chemistry, 
in part to test drive Authorea. “I found it really 
very pleasant,” Peverati says. 

Assuchtools gain traction, scientific articles 
become ever more dynamic - and responsive. 
On 20 March, Greene’s postdoc researcher 
Halie Rando created a Manubot project to 
try to make sense of the exploding COVID-19 
literature. Within days, dozens of researchers 
had expressed interest in contributing. “With 
something as fast-moving as COVID-19, we 
have anurgent need for consilience, but many 
members of the scientific community are more 
isolated than usual,” Rando explains. Manubot 
provides a forum for these far-flung research- 
ers to work together. “We hope to update it 
rapidly as new information emerges.” 


Jeffrey M. Perkel is technology editor at 
Nature. 
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verything’s cold when you're handling 
an Arctic Ocean ice core about 
500 kilometres from the North Pole: 
your feet, your fingers, your face. 
The air temperature here can be as 
low as -35 °C. At that point, every part of me 
wants to go back inside the RV Polarstern — a 
German icebreaker and my research home — 
to take off my wet latex gloves and warm up. 
Iconvince myself that I'll be OK. Another 
researcher watches me closely for signs of 
frostbite, and I’m watching her too. A crew 
member keeps an eye out for polar bears. I 
summon my self-control and finish my work 
— measuring the ice’s temperature, salinity 
and methane concentrations. It’s all part of 
my PhD research on marine geochemistry 
at the Alfred Wegener Institute for Polar and 
Marine Research in Bremerhaven, Germany. 
I spent from September to December 2019 
onthe Polarstern as part of an expedition 
called MOSAIC (Multidisciplinary drifting 
Observatory for the Study of Arctic Climate). 
This multinational project, running till 
September, is the first year-round expedition 
to explore climate in the far north — one 
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of the largest uncharted areas in climate 
research. About 60 researchers and 40 crew 
members live on board at a time — a small 
community drifting along with the ice pack. 
This picture was taken in mid-afternoon on 
10 November, well after the Sun had dropped 
below the horizon for the long polar night. 
I’m from Chile, and 1 didn’t grow up 
around alot of cold, snow andice. But I’ve 
learnt to embrace it. During my free time, I 
would sometimes wrap up in heavy winter 
wear for short walks on the ice with some 
friends and a polar-bear guard. We were 
too far north to see the northern lights, but 
the ice glowed in starlight and moonlight. 
Sometimes we didn’t even need headlamps. 
I feel lucky to bea part of this tremendous 
expedition. It’s an adventure. And I also 
have a lot of time to think. Time moves at a 
different speed on the ice. 


Maria Josefa Verdugo is a PhD student 
in marine biogeochemistry at the Alfred 
Wegener Institute for Polar and Marine 
Research in Bremerhaven, Germany. 
Interview by Chris Woolston. 


